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U.  S.  Army  Research  Of fice-Durham 
Box  CM,  Duke  Station 
Durham,  North  Carolina 


FOREWORD 


Several  years  ago  the  Office  of  Ordnance  Research  (now  the  Army 
Research  Office -Durham)  organized  an  OOR  Liaison  Group  on  Computers. 
Two  meetings  of  this  group  were  held,  one  in  1959  and  the  other  in  I960, 
to  exchange  information  of  interest  to  managers  of  ordnance  "other  than 
business"  computers.  The  Army  Mathematics  Steering  Committee  asked 
that  these  meetings  be  revived  and  placed  on  an  army-wide  basis.  The 
first  two  meetings  in  this  new  series  were  held,  one  in  1962  at  ARO-D  and 
the  other  in  1964  at  the  Harry  Diamond  Laboratories  and  the  National 
Bureau  of  Standards,  under  the  title  "ARO  Working  Group  on  Computers". 
The  1965  Conference  was  held  at  the  Ballistic  Research  Laboratories 
under  the  present  title  of  this  series;  namely,  "Army  Numerical  Analysis 
Conference".  The  1966  meeting  was  conducted  at  the  U.  S.  Army  Research 
Personnel  Office ,  Washington,  D.  C. 

The  U.  S.  Army  Mathematics  Research  Center,  University  of  Wis¬ 
consin,  served  as  the  host  for  the  1967  Army  Numerical  Analysis  Confer¬ 
ence.  It  was  held  at  the  Wisconsin  Center  on  25-26  May  1967,  and  was 
attended  by  over  58  scientists.  The  three  invited  addresses  were  delivered 
by  Professor  W.  Kahan,  D.  F.  Kennedy,  and  Dr.  Allen  Reiter.  They 
treated  respectively  topics  on  numerical  solutions  of  polynomial  equations, 
COSMIC,  and  intervai  arithmetic.  Besides  these  talks  there  were  nine 
contributed  papers. 

Dr.  Louis  B.  Rail  served  as  Chairman  on  Local  Arrangements.  Those 
in  attendance  were  indebted  to  him,  not  only  for  excellent  accommodations 
at  the  meeting,  but  also  for  organizing  a  large  portion  of  the  program. 

The  Chairman  of  the  conference,  Dr.  John  H.  Giese,  has  asked  that 
the  proceedings  of  this  meeting  be  published  and  issued  to  interested 
army  scientists.  He  would  like  to  thank,  on  behalf  of  the  Army  Mathe¬ 
matics  Steering  Committee,  the  sponsor  of  these  conferences,  all  the 
speakers  for  their  very  interesting  papers  and  the  various  chairmen  for 
their  help  in  conducting  this  meeting.  Thanks  are  also  due  to  Professor 
S.  C.  Kleene,  Acting  Director  of  MRC ,  for  his  interesting  welcoming 
remarks  and  for  having  his  installation  serve  as  host  for  this  conference. 
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COMPUTER  SOFTWARE  MANAGEMENT  AND  INFORMATION  CENTER 

COSMIC 

Donald  F.  Kennedy 

COSMIC,  The  University  of  Georgia  Computer  Center 
Athens,  Georgia 


INTRODUCTION.  In  July  1966,  The  University  of  Georgia  was  awarded  a 
contract  by  the  National  Aeronautics  and  Space  Administration  to  establish 
and  operate  a  center  for  the  dissemination  of  computer  programs  and  computer 
information.  This  center,  known  as  Computer  Software  Management  and  Informa¬ 
tion  Center  (COSMIC) ,  is  working  through  the  NASA  Technology  Utilization  Office 
at  the  Marshall  Space  Flight  Center  in  conjunction  with  other  NASA  Centers  and 
NASA  Headquarters.  Through  this  joint  effort,  computer  programs  and  computer 
information  developed  by  or  for  NASA  are  made  available,  at  minimal  costs,  to 
potential  users  in  industry,  business,  education,  and  other  sectors  of  our 
economy.  In  addition,  computer  programs  developed  by  or  for  the  Atomic  Energy 
Commission,  which  is  participating  in  the  NASA  Technology  Utilization  Program, 
are  also  made  available  through  COSMIC. 

PURPOSE.  One  of  the  primary  functions  of  the  NASA  Technology  Utilization 
Program  is  to  identify  technological  advances  derived  from  the  space  effort 
and  to  make  them  available  for  use  by  industry  and  business.  One  of  the  most 
useful  sources  of  technical  aid  and  information  to  many  organizations  is  a  wide 
range  of  well  documented,  operational  computer  programs  and  computer  information. 

By  making  these  computer  programs,  which  are  classified  as  new  technology, 
available  to  industry  and  business,  NASA  hopes  to  contribute  directly  to  the 
national  industrial  effort  and  offer  companies  the  opportunity  to  avoid  duplica¬ 
tion  and  to  shorten  the  task  of  developing  computer  programs. 

EXPERIENCE.  The  Computer  Center  at  the  University  of  Georgia  has  had 
extensive  experience  over  the  past  four  years  in  providing  comj*iter  services  and 
assistance  in  computer  applications  to  approximately  sixty  industrial  and  business 
firms.  The  Center  employs  a  professional  staff  of  statisticians,  mathematicians, 
biologists,  numerical  analysts,  engineers,  chemists,  physicists,  and  information 
and  computer  scientists.  The  two  major  computer  systems  in  the  Center  are  the 
IBM  360  Model  65  and  the  IBM  7094  with  two  IBM  1401  systems  serving  as  input/ 
output  peripheral  units  for  the  7094.  In  addition,  an  IBM  1620  computer  and 
an  EAI  TR-20  analog  computer  are  operated  on  an  open-shop  basis. 

PROCEDURE .  Under  the  original  contract,  NASA  performed  the  evaluation  of 
computer  programs  and  forwarded  to  COSMIC  only  those  programs  and  documentations 
which  were  to  be  included  in  the  COSMIC  library.  However,  under  a  modification 
of  the  contract  in  December  1966,  the  University  of  Georgia  was  given  the  additional 
responsibility  of  evaluating  NASA  computer  programs.  Documentation  on  each 
program  is  forwarded  to  COSMIC  for  evaluation  to  determine  its  applicability  to 
a  variety  of  uses  for  industry,  business,  and  education.  If  the  program  is 
found  applicable,  a  more  in-depth  evaluation  is  performed  considering  such  factors 
as  soundness  of  logic,  accuracy  of  output,  and  completeness  of  the  documentation. 
After  the  evaluation,  a  recommendation  is  made  to  NASA  as  to  the  inclusion  or 
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rejection  of  the  program.  If  the  computer  program  is  included  in  the  COSMIC 
library,  it  is  announced  to  industry  and  business  by  both  NASA  and  COSMIC. 

In  addition  to  the  NASA  computer  programs,  COSMIC  has  it  in  its  library 
computer  programs  obtained  from  business  and  educational  firms. 

The  programs  are  disseminated  on  tape  or  in  card  form,  depending  upon 
the  requestor's  preference.  Each  requestor  is  charged  for  the  reproduction, 
handling,  and  mailing  of  programs. 

Documentation  may  be  requested  without  a  program,  if  desired.  Originally, 
documentation  was  disseminated  at  no  charge  to  the  requestor;  however,  in  an 
effort  to  become  self-supporting,  COSMIC  has  instituted  a  slidirg-scale  charge 
for  documentation  based  on  a  fee  of  6  cents  per  page. 

A  directory  of  abstracts  of  computer  programs  available  from  COSMIC  is 
disseminated  periodically.  Interested  parties  can  receive  a  complimentary  copy 
by  writing  to: 

COSMIC, 

Computer  Center, 

University  of  Georgia 
Athens,  Georgia  30601 

CONCLUSION.  During  the  first  twelve  months  of  operation,  COSMIC  has 
had  great  success  and  growth.  It  has  received  requests  for  programs  and  informa¬ 
tion  from  every  section  of  the  nation  and,  in  fact,  from  every  part  of  the 
world,  even  from  countries  behind  the  iron  curtain. 

Based  on  its  present  success  and  growth,  COSMIC  should  become  the  largest 
disseminator  of  computer  programs  and  computer  information  in  the  nation  and 
should  have  one  of  the  most  complete  libraries  of  computer  programs  in  the  nation. 


MACHINE  LANGUAGE  PROGRAMMING 
HOW  AND  WHY* 


J.  M.  Yohe 

Mathematics  Research  Center,  U.  S.  Army 
Madison,  Wisconsin 

There  seems  to  be  a  feeling  in  some  quarters  that  Machine  Language 
programming  is  obsolete  --  or  at  least,  that  it  is  no  longer  useful  for 
everyday  applications.  This  feeling  is  largely  due  to  the  availability  of 
powerful  problem-oriented  languages  such  as  FORTRAN,  COBOL,  ALGOL, 
and  others.  With  these  languages  in  common  use,  the  argument  goes,  a 
person  needs  no  knowledge  of  Machine  Language;  the  compiler  does  all  of 
the  "dirty  work". 

This  is  evidenced  by  the  increasing  difficulty  of  using  machine  language 
in  programming.  For  example,  when  the  CDC  1604  computer  was  first 
installed  here  at  the  University  of  Wisconsin,  the  FORTRAN  compiler  allowed 
a  programmer  to  intermix  machine  language  and  FORTRAN  statements. 
However,  when  an  improved  FORTRAN  compiler  was  released,  this  capa¬ 
bility  was  missing.  And  in  some  installations,  the  use  of  machine  language 
programming  is  actively  discouraged. 

It  is  indeed  tempting  to  believe  that  machine  language  programming  is 
obsolete,  as  anyone  who  has  ever  done  any  machine  language  programming 
will  attest.  There  is  a  considerable  amount  of  boring  detail  connected  with 
writing  a  program  in  machine  language,  and  I  am  the  first  to  want  to  dispense 
with  it.  However,  I  don't  believe  that  machine  language  is  dead  yet,  nor  do 
I  believe  that  the  need  for  it  will  disappear  in  the  near  future.  I  feel  that 
every  programmer  should  know  something  about  programming  in  machine 
language,  even  if  he  never  uses  it.  And  I  believe  that,  in  most  cases, 
significant  savings  in  computer  time,  man-hours,  and  dollars  can  result 
from  judicious  use  of  machine  language.  There  are  two  major  reasons  for 
this  contention:  First,  a  programmer  who  knows  machine  language  can 
write  more  efficient  programs  than  one  who  does  not  know  machine  language. 
This  is  true  whether  he  writes  his  programs  in  machine  language  or  in  one 
of  the  problem  oriented  languages.  Second,  a  knowledge  of  machine  language 
can  be  of  great  help  in  debugging  programs,  whether  they  are  written  in 
machine  language  or  not. 

There  are  still  other  benefits  to  be  derived  from  machine  language  pro¬ 
gramming,  as  we  shall  see  presently. 

*Sponsored  by  the  Mathematics  Research  Center,  U.  S.  Army,  Madison, 
Wisconsin  under  Contract  No.:  DA- 31  -  124-ARO-D-462. 
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I  do  not  intend  to  take  anything  away  from  those  who  conceive,  imple¬ 
ment,  and  use  the  problem  oriented  languages.  On  the  contrary,  I  feel 
that  these  languages  are  vital.  I  would  even  go  so  far  as  to  say  that  perhaps 
most  computer  programming  should  be  done  in  these  languages.  I  do  want 
to  convince  you  that  these  languages  are  not  yet  the  answer  to  all  program¬ 
ming  problems. 

Let  us  first  make  a  few  remark,?  about  how  a  person  can  go  about 
acquiring  a  knowledge  of  machine  language  programming. 

Perhaps  the  most  important  comment  is  that  machine  language  pro¬ 
gramming,  like  any  other  discipline,  cannot  be  taught  --a  person  must 
learn  it.  In  learning,  motivation  is  an  important  factor;  the  best  motiva¬ 
tion  for  learning  machine  language  programming  is  a  need  to  know  it.  So 
if  you  supervisors  want  machine  language  programming  to  be  used  in  your 
installation,  I  urge  you  to  encourage  your  programmers  to  use  it  in  those 
situations  where  it  would  be  of  value. 

The  best  way  to  learn  machine  language  programming  is  from  an 
experienced  programmer  in  a  working  situation.  The  person  who  is  writing 
a  machine  language  program  and  has  access  to  an  experienced  programmer 
will  learn  programming  quite  rapidly.  Barring  that,  some  textbooks  can 
give  a  person  a  good  grounding  in  the  fundamentals  of  machine  language 
programming,  and  for  certain  computers,  there  are  handbooks  available 
for  learning  --  for  example,  Machine  Language  Programming  for  the  CDC 
3600,  MRC  Technical  Summary  Report  No.  721,  which  will  appear  shortly. 
The  computer  reference  manual  is  usually  one  of  the  least  effective  ways 
of  learning  machine  language,  but  it  will  do  in  the  absence  of  any  other 
source. 

The  only  really  effective  way  of  learning  machine  language  programming, 
however,  is  by  doing  it. 

Why  is  machine  language  programming  worth  consideration? 

There  are  several  reasons.  First  and  perhaps  foremost,  machine 
language  programs  can  be  considerably  more  efficient  than  even  the  most 
skillfully  written  programs  in  problem  oriented  languages.  The  compilers, 
after  all,  are  general  purpose  programs,  designed  to  handle  a  wide  variety 
of  cases  with  acceptable  efficiency.  They  cannot,  therefore,  tailor  programs 
to  specific  situations;  to  do  so  would  require  additional  logic  in  the  compiler 
program  to  the  point  that  the  compiler  would  be  cumbersome  and  quite  slow. 
Consider,  for  example,  the  question  of  testing  whether  A  =  B.  The  usual 
method  of  making  this  test  is  to  subtract  B  from  A  and  test  the  result  for 
zero,  and  this  is  quite  an  acceptable  method.  If,  however,  B  happens  to  be 
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zero  already,  there  is  no  need  to  do  the  subtraction;  we  need  only  test  A 
for  zero.  However,  many  compilers  do  not  even  recognize  this  particularly 
simple  special  case;  they  will  compile  code  to  subtract  zero  from  A  and 
test  the  result  for  zero.  Clearly,  a  person  writing  a  program  in  machine 
language  could  easily  eliminate  the  extra  subtract  instruction  which  the 
compiler  would  generate.  Far  greater  economies  are  usually  possible  in 
more  complex  situations. 

Another  benefit  derives  from  a  programmer  knowing  machine  language. 

A  programmer  who  knows  machine  language  can  often  write  more  efficient 
programs  in  a  problem  oriented  language  than  a  programmer  who  knows 
only  the  problem  oriented  language.  The  programmer  familiar  with  machine 
language  will  know  roughly  how  the  compiler  will  translate  the  source  state¬ 
ments  he  writes,  and  he  will  be  able  to  avoid  situations  which  cause  unneeded 
instructions  to  be  generated.  He  will  understand,  for  example,  exactly 
what  is  involved  in  mixed-mode  arithmetic  (for  example,  dividing  a  floating¬ 
point  number  by  an  integer)  and  will  be  able  to  make  an  educated  decision 
about  what  course  of  action  will  result  in  the  most  efficient  object  program. 
Moreover,  he  will  know  when  to  use  machine  language  and  when  to  stick  with 
the  problem-oriented  language. 

A  third  and  very  important  argument  for  a  programmer's  knowing 
machine  language  is  that  it  will  be  of  immeasurable  value  to  him  in  debugging 
his  programs,  whether  written  in  machine  language  or  in  a  problem  oriented 
language.  He  will  be  able,  for  example,  to  read  core  dumps,  understand 
what  kinds  of  errors  might  cause  a  certain  wierd  symptom,  and  even  track 
down  errors  generated  by  library  subroutines,  the  compiler  or  even  the 
computer  itself  (in  the  rare  instances  when  they  occur). 

We  turn  to  a  simple  example.  A  program  to  clear  an  array  to  zero  was 
written  for  the  CDC  3600,  first  in  FORTRAN  using  four  different  methods, 
and  then  in  machine  language.  Let  us  examine  the  source  statements  and 
the  code  generated  from  them,  and  then  the  machine  language  code  to  do  the 
same  thing. 

DO  10  1=1,  10000 
10  A(l)  =  0.  0 

ENA  1 

STA  =SI 

LIL  I,  1 

ENI  9999,  2 

BSS  0 

ENA  0 

STA  A-1,1 

INI  1,1 

IJP  WS00001 .  ,  2 
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Example  1  is  the  traditional  way  of  writing  this  program  in  FORTRAN, 
and,  it  turns  out,  is  also  the  most  efficient  way  of  doing  it  in  this  compiler. 
Note,  however,  that  the  instruction  ENA  0  (Enter  A  with  zero)  is  executed  on 
every  pass  through  the  loop,  eventhough  the  A-register  is. never  changed  in 
the  loop  and  thus  always  contains  zero  anyhow.  Note  also  that  two  index 
registers  are  used,  whereas  one  would  have  been  sufficient. 


Example  2 


1=1 

10  A(I)  =  0.  0 
1=1+1 

IF(I.  LE.  10000)  GO  TO  10 


.100001 

.100002 


ENA 

STA 

ENA 

LIL 

STA 

LDA 

INA 

STA 

LAC 

INA 

AJP.ZR 

AJP.MI 

SLJ 


1 

=SI 

0 

1,1 
A-I,  1 
I 
I 

=SI 

I 

10000 
.100001 
. 100002 
.  10 


In  Example  2,  the  DO-loop  logic  was  abandoned  ar.d  indexing  was  done 
explicitly.  This  resulted  in  a  far  less  efficient  program,  although  a 
sophisticated  compiler  could  have  improved  it  considerably.  For  example, 
in  this  situation,  the  variable  I  could  have  been  kept  in  an  index  register. 
Moreover,  the  variable  I  is  already  in  the  A-register  when  LAC  I  is  executed; 
the  compiler  could  have  engineered  matters  so  that  I,  rather  than  its  com¬ 
plement,  was  used  in  the  subsequent  instructions,  and  thus  eliminated  the 
LAC  I  instruction.  Note  also  that  an  extra  jump  instruction  is  executed  at 
the  end  of  the  loop;  AJP,  ZR  .  100001  could  equally  well  have  read 
AJP.ZR  .  1 0 -  -  or  even  been  eliminated  in  this  case. 
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Example  3 


1=1 

10  A(I)  =  0.0 
1=1+1 

IF(I-IOOOO)  10,10,20 


ENA 

1 

STA 

=SI 

ENA 

0 

LIL 

1,1 

STA 

A- 1 ,  1 

LDA 

I 

INA 

I 

STA 

=SI 

INA 

-10000 

AJP,  ZR 

.  10 

AJP.MI 

.  10 

Example  3  differs  from  Example  2  only  in  the  form  of  the  IF  statement. 
This  form  of  the  IF  statement  gave  a  more  efficient  object  code,  although 
many  of  the  remarks  concerning  Example  2  apply  equally  well  here. 


Example  4 


1=10000 
10  A(I)  =  0.0 
'  1=1-1 


IF(I.  NE.  0) 

GO  TO  10 

ENA 

10000 

STA 

=SI 

.  10 

ENA 

0 

LIL 

1,1 

STA 

A- 1 ,  1 

LDA 

I 

INA 

-1 

STA 

=SI 

INA 

-0 

AJP,  ZR 

. 100002 

.100001 

.100002 

SLJ 

.  10 

In  Example  4,  "reverse"  indexing  was  used  (as  will  be  the  case  with 
Example  5,  which  is  the  machine  language  version  of  the  program).  Many 
of  the  remarks  concerning  Example  2  also  apply  to  Example  4.  Note  here 
that  the  IF  (I.  NE.  0)  statement  generates  an  instruction  which  subtracts 
zero  from  I  and  then  tests  the  result  for  zero.  Note  also  that  the  construct 
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AJP,  ZR 


. 100002 


.100001  SLJ  .10 

could  have  been  replaced  by  the  single  instruction 


Example  5 


L 


AJP,  NZ 

.  10 

ENI 

9999, 1 

ENA 

0 

STA 

A,  1 

IJP 

L,1 

Example  5  is  the  machine  language  version  of  the  program.  Observe 
that  there  are  only  two  instructions  in  the  loop,  and  that  everything  done  in 
the  loop  must  be  done  in  the  loop,  while  everything  which  can  be  done  outside 
the  loop  is  done  outside  the  loop.  This  clearly  results  in  a  more  efficient 
program  than  even  the  most  efficient  program  generated  by  the  FORTRAN 
compiler. 

Comparing  the  most  efficient  FORTRAN  program  (Example  1)  with  the 
machine  language  program,  (Example  5)  we  see  that  two  extra  instructions 
are  executed  on  each  pass  through  the  loop.  The  execution  time  is  about 
2ps  per  pass.  In  10,000  passes  through  the  loop,  this  comes  to  about  30 
milleseconds  --  hardly  worth  considering.  But  if  the  procedure  were  to  be 
executed  a  hundred  thousand  times,  those  two  instructions  would  take  2,000 
seconds  on  the  3600.  At  11^  per  second,  those  two  innocuous -looking  instruc¬ 
tions  would  cost  $220.  00.' 

Let  us  now  consider  what  types  of  programs  should  ordinarily  be  written 
in  problem  oriented  languages  and  what  types  of  programs  stand  to  benefit 
from  being  written  in  machine  language. 

We  first  mention  a  few  cases  where  machine  language  programming 
should  not  ordinarily  be  used.  Programs  which  only  need  a  couple  of  minutes 
of  computer  time  can  usually  be  written  quite  economically  in  one  of  the 
problem-oriented  languages.  The  reason  for  this  is  that,  in  many  of  these 
cases,  system  overhead  is  responsible  for  a  significant  portion  of  the 
running  time.  There  simply  is  not  that  much  to  be  gained  by  speeding  up  the 
computation  itself  by  a  few  seconds.  Another  case  where  machine  language 
programming  might  be  a  mistake  is  when  answers  are  needed  in  a  hurry  -- 
that  is,  when  total  turnaround  time,  rather  than  computer  time,  is  the  over¬ 
riding  consideration.  In  these  cases,  the  longer  time  usually  required  to 
write  and  debug  a  machine  language  program  might  cause  intolerable  delays. 
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A  third  instance  where  machine  language  programming  is  not  usually 
indicated  is  the  case  of  the  "one-shot"  job,  where  the  program  will  be 
abandoned  or  significantly  changed  after  it  has  run  successfully.  In  this 
case,  the  computer  time  necessary  to  debug  a  machine  language  program 
could  well  cancel  any  saving  effected  by  writing  the  program  in  machine 
language. 

Where,  then,  would  machine  language  programming  be  worthwhile? 

The  most  obvious  place  is  in  programs  which  are  to  be  used  over  a  long 
period  of  time  with  no  changes  or  only  minor  changes.  If  machine  language 
programming  can  save  10%  on  a  program  which  will  run  for,  say,  1000 
hours  during  a  year's  time,  the  total  saving  will  be  100  hours.  If  the  com¬ 
puter  cost  is  $200.  00  per  hour,  this  would  result  in  a  dollar  saving  of 
$20,000.00.  This  is  a  realistic  figure. 

There  are  two  other  places  where  machine  language  programming  can 
be  of  definite  value.  The  first  is  the  case  where  a  problem  can  be  handled 
far  more  efficiently  by  use  of  machine  language  programming  than  by  the 
use  of  one  of  the  problem  oriented  languages  due  to  special  circumstances. 

In  this  case,  problems  which  were  not  economically  feasible  when  program¬ 
med  is  one  of  the  problem  oriented  languages  can  become  quite  reasonable 
when  written  in  machine  language.  The  second  is  the  case  where  it  is 
necessary  to  have  complete  control  over  the  exact  machine  operations  used 
as  well  as  the  sequence  in  which  they  are  used.  Such  would  be  the  case,  for 
example,  when  the  problem  required  strict  control  of  round-off  error.  The 
program  written  here  at  MRC  for  Professor  Lowell  Schoenfeld  to  Locate 
roots  of  the  Riemann  Zeta  function  falls  into  both  of  these  categories. 

Looking  at  the  program  for  this  Conference,  the  Newton  program,  to 
be  described  next,  and  Interval  arithmetic,  to  be  described  this  afternoon, 
both  use  machine  language  programming  to  good  advantage;  and  in  the  analysis 
of  round-off  errors,  which  will  be  covered  tomorrow,  a  knowledge  of  machine 
language  for  the  computer  in  question  is  almost  essential. 

In  summary,  then,  we  have  seen  that  knowledge  of  machine  language 
can  not  only  allow  a  programmer  to  write  machine  language  programs  when 
necessary,  but  it  can  also  help  him  to  write  more  efficient  programs  in  any 
language,  and  help  him  debug  programs  more  efficiently.  This  can  result 
in  significant  savings  in  both  time  and  money.  This  is  why  I  claim  that 
machine  language  programming  is  still  very  much  alive. 
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NEWTON:  A  GENERAL  PURPOSE  PROGRAM 
FOR  SOLVING  NONLINEAR  SYSTEMS 

Julia  H.  Gray  and  L.  B.  Rail 

1 .  Introduction.  A  number  of  important  problems  which  arise  in  practice 
may  be  reduced  to  the  computational  problem  of  solving  a  system  of  equations 
of  the  form 

fi(*l’e2’  ••*’en)  =  °  ’  i  =  (l.D 

In  (1.1),  the  functions  f^  are  assumed  to  be  known,  and  the  ^  are  the  un¬ 
knowns,  i  =  1,  2, . . . ,  n  .  It  may  be  supposed  that  all  values  are  real,  since 
in  the  case  of  complex  values,  (1.1)  may  be  written  as  a  system  of  2n  real 
equations  for  the  real  and  imaginary  parts  of  the  by  setting  the  real  and 
imaginary  parts  of  the  f^  equal  to  zero,  i  =  1,  2, . . . ,  n  . 

Thfs  repeat  describes  an  automatic  computer  program  for  solving  systems 
oi  the  form  (1.1)  which  was  developed  at  the  Mathematics  Research  Center  for 
the  CD C  3600  computer  operated  by  the  University  of  Wisconsin  Computing 
Center.  The  program  is  iterative  in  character:  it  starts  from  a  given  initial 
approximation  and  generates  successive  approximations  to  the  solution  of  (1.1), 

x*  =  ,  (1.2) 

until  pre-assigned  criteria  of  accuracy  are  met,  or  until  divergence  is  indicated. 


In  the  latter  case,  the  program  prints  an  appropriate  message.  The  convergence 
and  error  analyses  are  integral  parts  of  the  program.  The  program  is  general  pur¬ 
pose  in  that  it  will  handle  any  system  of  the  form  (1.  1) ,  up  to  the  limits  set 
by  available  core  storage,  in  which  the  functions  fj  can  be  written  in  terms  of 
ordinary  FORTRAN  statements. 

?..  Theory.  The  system  (1.  1)  can  be  considered  to  be  an  equation  of 
the  form 

F(x)  =  0  (2.  1) 

in  the  space  Rn  of  n-dimensional  real  vectors 

^  =  ’  (2.2) 

Here  F  is  the  vector  function,  or  operator,  defined  by 

F(x)=  (fjtx),  f2(x>, . . . ,  fn(x»  ,  (2.3) 

which  maps  the  vector  x  into  some  other  vector  in  Rn  .  A  vector  x  will  be  a 


solution  of  equation  (2.1)  if  it  is  mapped  into  the  zero  vector  0  =  (0,  0, . . . ,  0) 
in  Rn  by  the  operator  F  . 

The  (Frechet)  derivative  of  the  operator  F  is  the  nXn  matrix 


or,  for  brevity, 


(2.5) 


I 


i,  J  =  1,  2, . . . ,  n  [  1] .  F'  (x )  is  sometimes  called  the  Jacobian  matrix  of  the 
system  (1.1).  The  second  derivative  of  F  is  the  nXnXn  array  shown  in 


Figure  1,  or 


F"(x)  = 


VM  ’ 


(2.6) 


i,J,k  =  1,  2, . . . ,  n  in  condensed  notation.  For  operational  reasons,  the  con¬ 


vention 


°  -  _8_  ^i 


(2.7) 


is  adopted. 


The  second  derivative  is  a  special  type  of  bilinear  operator 


*  (Nk) 


(2.8) 


i,j,k  =  1,  2,  ...,n  in  .  R  [1,2]  . 


In  R  ,  the  norm  ||x||  of  a  vector  x  will  be  defined  to  be 


x  = 


lej  . 


(2.9) 


Similarly,  the  norm  || A j|  of  an  nXn  matrix  A  =  (d^)  is  taken  to  be 


I A II  =  l  la 


U'  ’ 


(2.10) 


and  the  norm  ||b||  of  an  nXnXn  bilinear  operator  B  =  (b  )  is  given  by 

1JK 


iibii  - ;y  l  l  ib  i . 

'  '  j=i  k=i  J 

In  (2.  9)  -  (2.11),  the  index  i  runs  over  the  integers  1,  2, . . . ,  n  . 
If  F  is  differentiable  at 

xo  =  ^1(0,^2(0)*---^n<0))  > 


(2.11) 


(2.12) 
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then  equation  (2.1)  may  be  written  in  the  form 


F(xq)  +  F'(xq)  (x-xq)  +  yQ  =  0 


(2.13) 


in  matrix-vector  notation.  F'(xQ)  is  obtained  by  evaluating  the  partial  deri¬ 
vatives  in  (2.4)  at  x  =  xQ  .  The  vector  yQ  is  srhall  relative  to  x-xQ  in 


the  sense  that 


lim 


I  x  —  x  0 1|  -*  0  II X  -  x  0 1 


=  .  0  . 


(2.14) 


If  a  solution  of  equation  (2.1)  is  close  to  xQ  ,  one  may  feel  Justified  in  drop¬ 
ping  yQ  to  obtain  the  approximate  linear  equation 


F(xQ)+  F'(xq)  (x  - x Q )  =  0  , 


(2.15) 


the  solution  x  =  x^  of  which  will  be 


-1 


xi  =  xo-tr'<Vl  r(V  ■ 


(2.16) 


provided  that  [F' (xQ ) ]  exists.  Set 


Go  =  =  tr-fcc,)]-1 


(2.17) 


In  terms  of  the  original  system  (1.1), 

‘i  *  (Si 


(2.18) 


may  be  written 


t  U)  _  t  (°)  y  „  (oi  t  (0)  ,  (0) 

ei  l  91J  V51  ’*2  . en  *> 


n 


(o),  /t  (0)  t  (0) 


(2.19) 


i  ~  1,  2,  • .  • ,  n 


On  the  assumption  that  Xj  is  a  better  approximation  than  xQ  to  a  solution 
x  =  x*  of  (2.1),  the  same  process  may  be  repeated  with  xQ  replaced  by  Xj 
to  obtain  a  further  approximation  x  ,  and  so  on. 


The  generation  of  the  sequence  {xm)  of  successive  approximations 

by  means  of  the  relationship 

x  .  =x  -[F'(x  )]-1  F (x  )  ,  (2.20) 

m+1  m  1  m  m  ’ 

m  -  0, 1,  2, . . .  is  called  Newton's  method  for  solving  equation  (2.1).  In  order 
r'or  the  application  of  Newton's  method  to  make  sense  from  a  computational  stand¬ 
point,  it  is  necessary  to  have  affirmative  answers  to  the  following  questions: 

(1)  Does  equation  (2.1)  have  a  solution  x  =  x*  ? 

(2)  Does  the  sequence  generated  by  (2.  20)  exist  and  converge 
to  x*  ? 

(3)  Is  it  possible  to  obtain  an  estimate  (that  is,  an  upper  bound) 

for  the  error  ||x  -x*||  of  approximation  of  x*  by  x  , 
m  m 

m  =  0, 1,  2, . . .  ? 


At  a  given  xQ  ,  it  is  possible  to  settle  these  questions  on  the  basis 
of  a  theorem  due  to  L.  V.  KantoroviC  [1,  2], 

Theorem.  At  x  =  xQ  ,  suppose  that  GQ  =  [F'(xQ)]  *  exists, 

llG0ll<B0  ,  (2.21) 

II Xj  ~ x 0 1|  =  II  -G0F(X0)||  <  T,0  ,  (2.22) 

and 


IIf"(x)||  <  k 


(2.23) 


for  x  In  the  set 


V(xQ,r)  =  {x  :  II x  - xQ II  <  r}  . 


If 


ho=  VoK±T 


(2.  24) 


(2.  25) 
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1 


1  -  n/  1  -  2hf) 

r  >  r.  =  - **-  n 

“  0  h 


(2.26) 


Then: 


(1)  Equation  (2.21)  has  a  solution  x*  In  V(xQ,r0)  ; 

(2)  The  Newton  sequence  {x  }  defined  by  (2.20)  exist*  and 

m 

converges  to  x*  ; 

(3 )  The  error  estimate 


lx*  -  x  ||  < 
m 


<2h  J 


2m-l 


(2.27) 


is  valid,  in  particular, 


IX+-XJI  <  2hQ  . 


(2.28) 


Proofs  of  this  theorem  may  be  found  elsewhere  [1,  2] .  It  is  used  as  the 
basis  for  the  optional  automatic  convergence  and  error  analysis  features  of  the 
computer  program  described  in  this  report. 

3.  Generation  of  the  Newton  sequence.  In  order  to  generate  the  Newton 
sequence  (xm)  defined  by  (2.  20),  subroutines  are  needed  to  perform  the 
following  operations: 

(1)  Evaluate  F(x  ) ,  that  is,  the  n  functions  f  (£  |  . . . ,  £  ^) , 

v  i  i  £•  n 

i  =  1,  2^  •  •  •  |  n  • 


(2)  Evaluate  F'(xQ),  which  consists  of  the  n  functions 

m,  »f1(t1<0),e2(0’ . 5n<0)> 


(3.  1) 


i»  J  =  )»  2, . . . ,  n  • 


(3)  Invert  the  matrix  F'(xQ)  »  If  possible,  and  form  the  vector 
defined  by  (2. 16 ) . 

In  addition,  it  is  necessary  to  evaluate  various  norms  in  order  to  deter¬ 
mine  whether  the  iterative  process  should  be  continued  with 

xQ  :  =  Xj  (3.  2) 

or  not. 

In  the  operation  of  NEWTON,  the  user  supplies  the  functions  fj  , 

i  =  1,  2, . . . ,  n  ,  written  in  a  form  suitable  for  compilation  by  the  CODEX  program 

{3],  which  is  essentially  the  same  as  for  the  FORTRAN  compiler  [4],  The  CODEX 

program,  which  was  developed  at  the  Mathematics  Research  Center,  prepares 

the  subroutines  for  the  evaluation  of  the  n  functions  fj  ,  i  =  1,  2, . . . ,  n  ,  and 
2 

the  n  derivatives  ,  i,  J  =  1,  2, . . . ,  n  .  This  relieves  the  user  of  a 

tedious  chore,  and  removes  a  possible  source  of  error.  This  takes  care  of  (1) 
and  (2). 

During  the  operation  of  CODEX,  one  of  the  following  error  messages  may 
be  printed  if  the  corresponding  restriction  is  violated. 

1.  "PARENTHESIS  ERROR  IN  DEFINITION  OF  name  of  function. "  Check 
the  parentheses  in  the  function  named,  correct  the  error  and  resubmit. 

2.  "STORAGE  INSUFFICIENT  FOR  COMPILING.  "  This  message  signifies 
that  the  system  of  equations  is  too  large  for  the  program  to  handle.  At  present, 
the  program  will  handle  a  system  of  24  equations  in  24  unknowns.  The  equations 
are  relatively  sparse,  however,  and  there  is  no  guarantee  that  another  system 

of  that  size  could  be  handled.  Unfortunately,  the  program  occupies  most  of  the 
storage  available  in  the  CDC  3600,  so  little  can  be  done  outside  of  rewriting  the 
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entire  program  when  this  message  is  received. 

3.  "Name  of  storage  area  STORAGE  INSUFFICIENT  FOR  DIFFERENTIA¬ 
TION.  "  This  message  occurs  when  storage  is  exceeded  during  differentiation 
of  the  equations.  As  above,  there  is  not  much  that  can  be  done. 

4.  "Name  NOT  DIFFERENTIABLE.  "  This  is  caused  by  attempting  to 
differentiate  an  operator  whose  derivative  has  not  been  defined.  Check  the 
equations. 

5.  "ILLEGAL  VARIABLE  DETECTED.  "  This  occurs  when  the  evaluating 
portion  of  the  program  comes  across  an  improperly  named  variable.  Check  for 
a  variable  whose  name  has  more  than  3  characters. 

The  matrix  operations  (3)  are  standard,  and  the  user  may  employ  any 
matrix  inversion  program  he  chooses,  provided  that  it  gives  the  indication  of 
failure 

ISING  *  0  (3.3) 

on  return  to  the  main  program.  The  routine  used  here  (INVERT)  is  a  slow  but 
accurate  program  which  uses  double  pivoting.  For  large  nonlinear  systems, 
such  as  arise  in  the  solution  of  nonlinear  elliptic  boundary  value  problems  [5], 
an  iterative  subroutine  may  be  required. 

If  the  matrix  inversion  falls,  the  program  terminates  by  printing  the 
message 

"DIVERGENCE  INDICATED  AT  ITERATION  NUMBER  _  DUE  TO 

FAILURE  OF  MATRIX  INVERSION.  " 

The  matrix  which  the  inversion  subroutine  failed  to  invert  will  be  printed  if  the 
user  desires.  A  new  value  of  xQ  may  be  taken  by  the  program  at  this  point. 


The  following  constants  are  calculated  by  NEWTON  for  comparison  with 
tolerances  provided  by  the  user: 

(1)  II F (x Q ) H  is  compared  to  the  given  numbers  F  and  FF  .  If 

II F (x Q ) JI  <  F  ,  (3.4) 

then  x*  =  xQ  (the  current  value  of  x)  is  taken  to  be  a  solution  of  (2.1). 
Following  the  message 

"SUCCESSFUL  CONVERGENCE  AT  ITERATION  NUMBER  _ WITH 

NORMF  =  _  LESS  THAN  OR  EQUAL  TO  _ . »  , 

cne  values  of  •  •  • ,  $n*  and  f^*,  l2*, . . . ,  £n*) , 

W’  *2*’  •  •  *  ^n*,,  ‘  ’  W’  V’  are  printed* 

If 

|| F (x Q ) ||  >  FF  ,  (3.5) 

.nen  it  is  assumed  that  the  method  is  divergent,  and  the  program  is  stopped 
tor  given  another  value  of  xQ  ) .  This  feature  prevents  generation  of  a  sequence 
of  useless  values.  The  message  printed  in  this  case  is 

"DIVERGENCE  INDICATED  AT  ITERATION  NUMBER  _ AS  NORMF 

=  _  IS  GREATER  THAN  _ .  " 

(2)  The  number 

HgII  =  ||[F'  (x0)]_1  II  (3.6) 

.s  compared  to  the  given  number  BB  .  If 

HgII  >  BB  ,  (3.7) 

then  the  program  terminates  the  iteration  and  prints  the  message 

"DIVERGENCE  INDICATED  AT  ITERATION  NUMBER  _ AS 

BOUND  G  =  _  IS  GREATER  THAN  _ .  " 
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If  the  user  desires,  the  matrix  G  =  [F'{x^)]  ^  will  also  be  printed.  Condition 
(3.  7  )  is  used  for  a  divergence  criterion  for  two  reasons:  A  large  value  for  IIgII 
indicates  that  F'(xQ)  may  be  singular  or  nearly  singular,  hence  the  components 
of  G  may  be  in  error  by  large  amounts;  also,  the  value  of  Xj  will  be  in¬ 
accurate  even  if  F(xQ)  is  knc  vn  fairly  exactly  . 

(3)  The  quantity 

llVx0ll  =  II  - [F1  (xQ )]  1  F(xq>||  ,  (3.8) 

is  compared  to  the  given  numbers  C  and  CC  .  If 

llxj-xJI  <  C  ,  (3.9) 

then  x*  -  Xj  is  taken  to  be  a  solution  of  (2.1).  The  message, 

"SUCCESSFUL  CONVERGENCE  AT  ITERATION  NUMBER  _ _ WITH 

NORMCX  =  _  LESS  THAN  OR  EQUAL  TO  _ ,  " 

is  printed,  followed  by  the  values  of  x*  and  F(x*) .  If 

Hxj-XpJI  >  CC  ,  (3.10) 

divergence  is  assumed,  and  the  program  terminates  the  iteration  and  prints  the 
message: 

"DIVERGENCE  INDICATED  AT  ITERATION  NUMBER  _ AS 

NORMCX  =  _  IS  GREATER  THAN  _ .  " 

(4)  The  total  number  of  Iterations  m  is  compared  with  the  given  number 
LIMIT  .  If 

m  >  LIMIT  ,  (3.11) 
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divergence  is  assumed,  the  iteration  terminates,  and  the  message, 


"DIVERGENCE  INDICATED  AT  ITERATION  NUMBER _ AS  THE 

NUMBER  OF  ITERATIONS  HAS  EXCEEDED _ ,  " 

printed.  This  control  prevents  the  computer  from  generating  a  sequence 
wnich  flounders  aimlessly. 

(5)  Finally,  each  iteration  is  timed,  and  the  total  elapsed  time  plus 
the  time  for  the  previous  iteration  is  compared  to  TLIM  ,  the  number  of  milli¬ 
seconds  allowed  for  the  total  iteration  by  the  user.  If  this  estimate  for  total  time 
at  the  end  of  the  next  iteration  exceeds  TLIM  ,  the  program  prints  the  message 
"NOT  ENOUGH  TIME  REMAINS  FOR  THE  NEXT  ITERATION,  " 
and  the  current  values  of  x  ,  F(x)  ,  and  other  parameters.  This  feature  of  the 
program  prevents  loss  of  information  due  to  a  time  limit  interrupt. 

4.  Error  Estimation.  An  optional  feature  of  NEWTON  is  automatic  error 
estimation,  using  (2.28),  which  may  be  written 

llx’f-Xjll  <  2B0t!02K  .  (4.1) 


The  quantities 

Bo=  II^M'1!!  >  ’1o=^xi_xo^  (4,2) 


are  available  immediately  from  the  computation  of  x^  by  the  process  described 
in  Section  3.  The  only  remaining  quantity  is  the  bound 

K>  ||F"(x)||  (4.3) 


in  a  ball  V(xQ,  r)  , 

V(xQ,  r)  =  {x  :  II x  —  xQ  II  <  r} 


(4.4) 
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< 

I 

of  sufficiently  large  radius  r  so  that  (2.26)  will  be  satisfied  if  <y  . 

Two  options  are  available  to  the  user: 

(1)  If  a  value  of  K  ,  or  a  special  method  for  computing  K  is  known, 
then  this  value,  or  a  subroutine  for  computing  K  ,  may  be  inserted  into  the 
program.  The  value  of  K  is  called  BNORM  in  the  program. 

(2)  The  program  will  form  the  second  derivatives  required  for  F"(x)  as 

■ 

given  by  (2.6),  and  estimate  K  by  the  use  of  interval  arithmetic  [6,7].  This 
estimation  makes  use  of  the  program  INTERVAL  [8],  which  was  developed  at 
the  Mathematics  Research  Center  to  add  interval  arithmetic  to  the  modes  of  com¬ 
putation  available  on  the  CDC  1604  and  CDC  3600. 

To  perform  this  estimation,  subroutines  for  the  evaluation  of  the  n(n-l)/2 
distinct  second  derivatives  (2.6)  in  interval  arithmetic  are  compiled  by  CODEX 
and  INTERVAL .  (Recall  that 

A. 


a2f. 


(4.5) 


t,  j,  k  =  1,  2, . . . ,  n  .  )  Each  derivative 


S2f 

9ijk(x)  =  gijk(^1^2,,,,,^n)  = 


is  evaluated  as  an  interval -valued  function  of  the  interval  vector 


(0)-  (0) 
1  ’  1 2  » 


(4.6) 


(4.7) 


with  components  which  are  the  intervals 


The  iteration  is  assumed  to  be  divergent,  and  is  terminated.  This  situation  is 
indicated  by  the  message 


"DIVERGENCE  INDICATED  AT  ITERATION  NUMBER _ AS 

HO  =  _  IS  GREATER  THAN  _ .  " 

If 

hQ  <  HH  , 

then  the  error  bound  (2.  28), 

llx^-Xj ||  <  2hQTi0 

is  calculated,  and  compared  to  preassigned  number  E  .  If 

h*-xl ||  >  E  , 

then  the  program  performs  another  iteration  with  xQ  :  =  Xj  .  If 

llx*-Xlll  <  E  , 


(4.16) 


(4.17) 


(4.18) 


(4.19) 


then  Xj  is  regarded  as  being  a  sufficiently  accurate  approximation  to  the 

v 

solution  of  F(x)  =  0  ,  and  the  program  prints  the  message: 


"SUCCESSFUL  CONVERGENCE  AT  ITERATION  NUMBER _ WITH 

ERROR  =  _  LESS  THAN  _ ,  " 

followed  by  the  values  of  x^  and  F(Xj)  . 

During  the  operation  of  the  program  while  using  the  error  estimation 
option,  all  of  the  controls  described  in  Section  3  remain  in  effect.  The  auto- 
Aiatic  error  estimation  feature,  using  INTERVAL,  lengthens  the  computation  time 
for  each  iteration  considerably.  In  the  case  of  a  simple  system  of  three  equations 
in  three  unknowns  to  be  presented  later  as  an  example,  this  amounts  to  a  factor 
of  ten.  Consequently,  unless  an  error  estimate  is  of  great  moment,  one  of  the 
other  parameters  could  be  taken  as  an  accuracy  control,  perhaps  after  a  test  run 
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on  a  typical  case  with  error  estimation  shows  that  some  other  criterion  is  re¬ 
liable.  Because  of  the  rapid  convergence  of  Newton's  method,  as  shown  by 
(2.  27),  the  price  of  an  extra  iteration  or  two  to  obtain  a  value  of  Hx^  - xQ  li 
or  ||F(x  )||  which  is  smaller  than  necessary  for  the  required  accuracy  is  prob¬ 
ably  less  than  that  of  the  automatic  error  estimation  procedure. 

It  is,  of  course,  possible  to  become  fanatical  about  rigorous  error  esti¬ 
mation.  One  may  note  that  the  computation  of  F(xQ) ,  F'(xQ),  [F'(x0)]  1  , 

and  thus  x^  ,  are  subject  to  round-off  en-or,  so  that  one  obtains  some  x^ 
instead  of  the  x^  called  for  by  the  theory  in  Section  2.  If  one  can  estimate 
llxj-xjll  by  interval  methods  [6,7]  or  by  other  procedures  [9,10],  then 

||x*  -  II  <  llx^xjll  +  llx’t'-xjl  <  2hQTi0+  llxj-xJI  (4.20) 

is  a  rigorous  bound,  as  long  as  B^,  ,  and  thus  h^  ,  are  upper  bounds 

for  the  corresponding  exact  quantities.  (In  using  automatic  error  analysis,  the 
factor  of  overestimation  of  K  usually  dominates  the  much  smaller  errors  in 
the  calculation  of  x(  ,  so  that  (4.17)  gives  a  correct,  if  pessimistic,  result.  ) 
In  addition,  there  can  be  errors  in  the  coefficients  of  the  system  to  be 
solved,  or  limitation  on  the  accuracy  with  which  they  are  known.  This  gives 
rise  to  an  uncertainty  error  in  the  numerical  solution.  Also,  if  the  system  to 
be  solved  is  a  finite  approximation  to  a  differential  or  integral  equation,  there 
.s  a  discretization  error  due  to  the  method  of  approximation  used.  Analysis  of 
these  errors  is  completely  outside  the  scope  of  this  paper. 

5.  Flow  chart.  The  structure  of  the  program,  which  was  described  above 
in  narrative  fashion,  is  shown  geometrically  by  the  flow-chart  in  figures  2,  3, 
and  4. 
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Figure  2.  Initialization 
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Figure  3.  Iteration 
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Calculate 


Calculate 

h  *  IIaxII*  || [F* (x >]~  || * K 


Calculate 
Error  = 
2h  || ax || 


r  Indicate 
convergenci 


6.  Input.  Explanations  of  format  designations  may  be  found  in  [4], 

The  first  input  card  contains,  in  I  5  format,  the  number  of  systems  of  equations 
to  be  read  in  and  solved  during  the  run.  The  second  input  card,  which  is  read 
in  under  3E20.  8  format,  contains  in  columns  01-20,  F  ,  the  value  of  the  toler¬ 
ance  to  be  allowed  on  ||F(x  )||  .  Columns  21-40  of  the  second  card  contain 

n 

FF  the  upper  bound  for  ]| F(x  )||  .  Columns  41-60  of  the  second  input  card  con- 

n 

tain  BB  ,  the  upper  bound  on  the  norm  of  the  inverse  of  the  partial  equations 
matrix. 

The  third  input  card,  also  read  in  under  3E20.  8  format,  contains  in  the 
irst  20  columns  C  ,  the  tolerance  on  the  norm  of  the  increment  vector.  Columns 
21-40  of  the  third  input  card  contain  CC  ,  the  upper  bound  on  the  norm  of  the 
increment  vector.  Columns  41-60  of  the  third  input  cara  r  n  air  the  vune  in 
milliseconds  allowed  for  the  iteration  section  of  the  program 

The  next  input  card  supplies  the  program  with  parameters  which  wili 
determine  what  options  are  to  be  used  as  well  as  several  iteration  limits.  The 
card  is  read  in  under  a  1515  format.  If  columns  1-5  are  zero,  the  error  analysis 
subroutine  will  not  be  used,  otherwise,  the  subroutine  will  be  called.  Columns 
6-10  indicate  whether  or  not  the  matrix  of  the  partial  derivatives  is  to  be  printed- 
out  in  case  the  inversion  of  this  matrix  fails.  If  columns  6-10  are  zero,  the 
matrix  will  not  be  printed  out,  otherwise,  a  printout  will  be  given.  If  columns 
11-15  are  not  zero,  a  printout  of  the  inverse  of  the  Jacobian  matrix  will 
be  given  when  the  norm  of  this  matrix  exceeds  the  given  upper  bound  and  diver¬ 
gence  of  the  system  is  thus  indicated.  If  columns  11-15  are  zero,  no  printout 
will  be  given.  Columns  16-20  indicate  how  often  a  printout  of  the  intermediate 


values  of  the  Newton  sequence  is  desired.  If  this  printout  is  desired  every 
time,  then  there  should  be  a  1  in  column  20,  if  every  other  time,  column  20 
should  be  2 ,  etc.  Columns  21  -25  give  the  number  of  iterations  to  be  allowed 
in  searching  for  a  solution.  This  number  must  be  right  adjusted  in  the  field. 
Columns  26-30  give  the  number  of  sets  of  starting  values  which  are  to  be  used 
with  the  system  of  equations.  If  columns  36-40  are  zero,  there  will  be  a  print¬ 
out  of  the  formulas  which  CODEX  makes  up  for  the  given  equations,  the  partial 
derivatives,  and  the  second  partial  derivatives.  If  columns  36-40  are  not  zero, 
no  printout  will  be  given.  Columns  41  -45  need  be  used  only  if  the  error  routine 
is  being  used.  If  so,  then  column  45  is  1  if  the  norm  of  the  matrix  of  the  second 
partial  derivatives  is  not  known  and  must  be  computed.  If  the  norm  is  known, 
then  column  45  is  2  . 

The  next  input  card  is  supplied  only  if  the  error  analysis  routine  is  to  be 
used.  Otherwise,  it  should  not  be  present.  This  card  is  read  in  under  3E20.  8 
format.  Columns  1-20  of  this  card  contain  the  allowed  tolerance  E  on  the  error 
oound.  Columns  21-40  contain  the  upper  bound  HH  on  the  convergence  constant. 
Columns  41-60  need  be  supplied  only  when  the  norm  K  of  the  second  derivative 
is  known.  These  columns  then  contain  this  norm. 

The  next  data  cards  contain  the  names  of  the  independent  variables  and 
their  starting  values.  The  names  of  the  variables  are  limited  to  three  non-blank 
alphanumeric  characters,  the  first  of  which  must  be  an  alphabetic  character. 

The  first  name  may  be  punched  in  any  of  the  first  72  columns  of  the  card.  It  is 
followed  by  at  least  one  blank  and  the  starting  value  corresponding  to  the  variable. 
The  starting  value  must  be  followed  by  at  least  one  blank,  and  then  the  name  of 


the  next  independent  variable  and  its  starting  value  are  given.  When  all  of  the 
independent  variables  and  their  starting  values  have  been  given,  the  last  entry 
is  followed  by  at  least  one  blank  and  a  $  .  The  starting  values  may  be  given 
as  a  fixed  point  integer,  a  floating-point  number  with  a  decimal  point,  or  a 
FORTRAN  E-format  number.  The  numbers  may  be  signed  or  unsigned. 

The  last  group  of  data  cards  contain  the  equations  for  which  a  solution 
is  to  be  found.  These  must  be  in  the  form  F(x)  =  0  where  F(x)  is  an  arith¬ 
metic  expression  using  any  of  the  operations  + ,  -  ,  *,  1 ,  **  and/or  any  of 
the  transcendental  functions  sine(x),  SINF(x) ,  cosine(x),  COSF(x) ,  (natural) 
iog(x),  LOGF(x),  exp(x) ,  EXPF(x),  and  arctangent(x) ,  ATAN(x)  .  F(x)  is 
then  given  to  the  program  in  the  form 

variable  name  =  F(x)  . 

Tne  above  formula  must  be  punched  with  at  least  one  blank  between  consecutive 
symDols.  As  with  the  independent  variables,  only  the  first  72  columns  of  a 
card  are  significant.  The  formula  may  be  continued  on  any  number  of  consecutive 
cards  and  is  terminated  by  a  blank  and  a  $  following  the  last  symbol  of  F(x)  . 

The  program  reads  in  the  independent  variables  until  it  encounters  a  $  . 
Then  it  expects  to  read  in  as  many  equations  as  independent  variables,  and  it 
separates  and  counts  these  by  the  $  at  the  end.  If  a  $  is  misplaced,  any  of 
error  returns  1,  4,  and  5  from  CODEX,  as  well  as  an  unchecked  EOF  are  equally 
likely  to  occur. 

7„  An  example.  The  results  of  computation  of  the  solution  of  the  system 


4  4  4 

16x  +  I6y  .+  z  -16  =  0 

x2  +  y2  +  z2  -  3  =  0  (6.1) 

3 

x  -  y  =0 

in  the  first  octant  are  shown  in  Appendix  I  with  and  without  automatic  error 
estimation.  The  initial  approximation  was  taken  to  be 

xQ=  (1,1,1)  .  (6.2) 

Other  applications  of  this  program  have  been  made  to  finding  character¬ 
istic  values  and  vectors  of  matrices  [11, 12],  and  solutions  of  systems  arising 
in  magnetohydrodynamic  problems  [13],  Its  performance  in  every  case  has  been 
satisfactory. 

8.  Warning.  The  complete  program,  except  for  unmodified  subroutines 
of  CODEX  [3]  and  INTERVAL  [8]  is  listed  in  Appendix  II.  Many  subroutines 
are  in  CDC  3600  machine  language  [14],  or  use  constants  peculiar  to  the 
CDC  3600.  Consequently,  it  is  doubtful  that  the  program  as  listed  will  work 
at  any  other  installation,  or  survive  future  changes  in  the  operating  systems 
program  at  the  University  of  Wisconsin  Computing  Center.  However,  the  listing 
given,  together  with  the  description  given  above,  should  be  a  reliable  guide  for 
the  adaptation  of  this  program  for  use  elsewhere. 
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APPENDIX  I 


(1 )  Example  without  automatic  error  estimation. 
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« 

000  A  T 

! 

0003  T  a 

Y 

«• 

OOO^C 

1 

T.  ! 

000  4T”i_ 

0TT7C 

» 

OOOTT 

ooosr  a 

0002T 

♦ 

OOft^T 

00  0  6  T  = 

Z 

«» 

ooo**c 

0007T  « 

0005T 

* 

0  0  0  O  T 

FI  a 

0007T 

- 

oouc 

F2  a  X  **  2  ♦ 

Y  ##  2 

♦  z  ** 

2  -  3. 

%  1 

COUt  LIST  FOK 

F2 

■  MUNI 

X 

*• 

ooo^c 

00  111  a 

Y 

«• 

0002C 

0012T  a 

0010T 

* 

00 1  A  T 

1 

0013T  a 

z 

•• 

0002C 

! 

0Q14T  a 

0012T 

* 

001-1T 

F2  ■ 

0014T 

m 

OOOiC 

1 

» 

F3  «  X  *•  3  - 

Y  S 

"XOUE  List  FOR 

F3 

i 

0015T  a 

X 

•• 

0003C 

F3  ■ 

0015T 

Y 
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C DUE  TTST  row — mm 

OOlbT  ■  X _ #• _ noo^C 

0017 r  «  0004C  v  001  of 

0020T  ■  Q0 i^c _ * _ 001/T 

oil  ■  ♦  oo? Or 

ltst  for — rrrru - 

_ 00? IT  Y _ ** _ OOO^C 

0022T  *  0004C  #  00?It 

0023T  »  Q012C  »  0Q2?  T 

012  ■  ♦  002^1 


“CODE  tIST~FUR — mn - 

002^1  *  Z _ *  * _ _ ooo^c 

0025T  ■  0004C  *  002?T 

01,3  *  ♦ _ OQgbT 

CODE  LIST  FOR  2 1  It  0 
0026T  ■  0002C  *  X 

_ 021  ■ _ ♦ _ OOgbT 

CODE  LIST  FOR  2t  2t  0 
002  77  a  0002C  *  Y 

!  022  » _ ♦  QQg  n 

COOE  list  FOR  2t  3 1  0 

0Tr3or~ «  0002c  *  1 

_ 023  S _ ♦ _ 0  Q  3  0  T. 

CODE  LIST  FOR  3t  It  0 
0031T  a  X  **  OOO^C 

003?T  ■  QQ03C  »  0031T 

031  =  ♦  003*7 


ITERATION  NUMbt* 


J 


NORM  h  *  l . 7uuooonu*ooi - 

x _ a  1 . 00000000*000  FI  a  1 , 70000000*0*0 1 

Y  *  1  •  D  0  000  CD  JVO  0  0  f~2  a  U,00UOOOOO*OOO 

*  _  *  l . oouuonoo*ooo _ F3  0,00000000*000 

TIME  PER  ITERATION  a  70.00  MILLISECONDS 


ITERATION  NUMBER  2 


NORW-p-r  r.  79191714*000“ 
NORM  CX  ■  2 , 83333333-001 


—BOUND  FPRIME  INVEH5E  "= 

S.56000OOO-OO1 

X  a  9,29166667-001 

Fl  « 

4.79191714*000 

T  a  7,875001500-001 

“F2  » 

I.  30451^389-001 

Z  a  1.28333333*000 

F  3  ■ 

1,46966669-002 

TIME  PER  ITERATION  a  125,00  MILLISECONDS 


ITERATION  NUMBER  3 


NORM~T  a  6,4 8309522- 0 01 

NORM  CX  g  9.43241  »Q7«Qf)2 _ 

BOUNO  FPRIME  INVERSE  ■  5,56439196-001 

X _ a  _  8  .  B7  0  74529-00^1 _ Fj _ a 

Y  a  6.93175859-001  Ft  a 

Z _ « _ 1  >32066464*000 _  Fl  a 


6.45309522-001 
1.20773905-002 
jL^6US4l7Q94rQ03 . 


TIME  PER  ITERATION  a  460.00  milliseconds 


I  TER A T I On  number  4 


NOHM-f  a - 17645094  52'- 0"0£ - 

NORM  CX  a  1 .59811520-00<i 

BOUND  FPRIME  INVERSE” - 6 .14254  336-001 - 

X  =  8.78244398-001  '  Fl  a  1,84509452-002 

Y  a  6. 77194707-001  Fz  i  4,28336556-004 

_ Z  »  1,33060980*000 _ F3  a  2.Q6610364-QQ4 

TIME  PER  ITERATION  a _ 132,00  MILLISECONDS _ 


Iteration  number  5  , 

NORtt“F  -a - 1". 4 7977844-0 05  ' 

NORM  CX  a  4.37403606-004 


'  8  0UTC1T  FPRIME  ~TN  V  E  R  S  b  "  =  " 
X  a  0.77965993-001 

b. 61437802-001 - 

*  p  ■ 

1,47977844-005 

Y  — 

6#  f  6  f b  r 30*~OU 1 

F2  a 

3,29047907-007 

Z  a 

1,33085521*000 

F3  a 

2,04207026-007 

time  per 

ITERATION  a 

127.00  MILLISECONDS 
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“SUCCESSFUL- CBNVfctfto'FMCfc'  at  ITERATION  *MUM0fctf  & 


(2)  Example  with  automatic  °rror  estimation 


F  ■  1*00000000-009 

FF  «  1.00000000*006 

88  »  1.00000000*006 

_J C  JH _ L*il0000000-009 _ _ 

CC  *  1.00000000*006 

TLI*  *  1*20000000*005 _  _ _ 

IERR  »  1  IMAT  «  1  10MP  ■  1 

- NITCIS  ■  1  LIMIT  ■ _ 25  IAQAIN  » _ 1__ 

NCF  ■  1  NP«T  ■  0  IAVL  *  1 

_ E  jl _ 1*40000 00 0-0.00 _ 

HH  •*  1,00000000*006 

8NCRM  ■  -0,00000000*000 _  _ 


NEh TONS'  METHOD 

XI,  T  1,  Z  1,  S _ _  _ _ 

FI  *  16.  *  X  •*  A  ♦  16.  *  Y  **  A  ♦  Z  **  A  -  16,  S 

CODE  XTsf- FOR- FI 

0001T  ■  .  X  _ 000AC _ 

0002T  ■  0012C  *  0001T 

0003T  .»  Y  _ 000 AC _ _ 

000 AT  ■  0012C  *  0003T 

0005T  ■  0002T _ ♦ _ 000  Aj _ 

0006T  *  Z  **  OOOAC 

0007T  ■  0005T  ♦ _ 00061 _ 

Fl  n  0007T  -  0012C 

F2  X  **  2  ♦  Y  ««  2  ♦  Z  ««  2  ■  3.  I _ 


COPE  LIST  FOR  F2 


0010T  «  X 

•• 

0002C 

_  00 1 IT*  Y _ 

_•* _ 

0002C_ 

0012T  ■  0010T 

♦ 

ooi  it 

0013T  ■  Z 

#• _ 

0002c 

OOlAT  ■  0012T 

* 

00 1 3T 

F2  -  OOlAT 

m 

0003C 

F3  ■  X  *•  3  -  Y 

% 

000 E  LIST  For  It  It  0 

_  0016T  ■  X  _  #*  0003C  _ 

0017T  ■  0004C  •  0016T 

__  0020T  ■  0012C_.  *  _  0 0 1 7 T 

Oil  ■  ♦  0020T 

OCOE~LTST~ FORTi'TTF 

_ 0021T  «-  Y  _  „  0003C 

0022T  ■  0004C  *  0021T 

__  0023j_.  001 ZC _ • _ 0022t _ 

012  ■  ♦  0023T 

COOE  LIST  F OR  It  3 i  0 

_ 0024T  b  Z  »» _  0003C  _ 

0025T  •  0004C  •  0024T 

013  _■  .  _ ♦ _ ..  0025T  _ 


_ OOTE  .LISTL-FOR  It  It  1  _ 

567 lT  a  X  ••  0002C 

_ 5672T .  ■  0003C-J* _ 5671T_. 

5673T  ■  00040  *  5672T 

_  5674J  a  0012C__A__ _ _5673X _ 

111  a  ♦  5674T 


COOrrisT  For  It  If 
112  not  On  f -LIST,. 

2 

OOOE  LIST  FOR  It  It 

3 

1 1 3  NOT  ON  F-LIST. 

COCE  LIST  FOR  It  2t 
5671T  ■  Y  •• 

2 

.  00020  . 

5672T  ■  0003C  • 

i5673T  *  0004C  * 

5671T 
_ 5672T _ 

5674T  ■  0012C  • 

5673T 

122  ■  ♦ 

5674T 

COOE  LIST  FOR  It  2t 
not  Qn  p-LIST. 

3 

.  QOCE  LIST  FOR _ Lt^l*_ 

3 

567 IT  a  Z  •• 

S672T  a  0003C  * 

0002C 
567 1 T 

5673T  ■  0004C  * 

5672T 

133 _ b _ * _ 5673t 


CODE  LIST  FOR  2 ,  1,  0 

0026T  a  0002C  «  X 

C21  —  *  * -  0026T— 

CODE  HSL.FOR _ 2j_2lJ? _ 

0027T  a  0002C  *  Y 

022 _ ■_  + _ i)027l _ 

CODE  .L 1 S  T  F  0  P.2  *  _lt_Q _ 

0030T  a  0002C  •  Z 

023 _ ♦ _ 0.030.1— 


CODE  LIST  1 0R.__2».  1#_JL _ 

211  a  ♦  .  0002C 

COCE  L1ST  F0R~2»  l7T_ 


CODE  LIST  FOR.  ...2 *_  1  • _ 3. _ 

213  NOT  ON  F-LIST. 

CODE  LIST  FOR  2»  2«  2 
222 — a - * - 0002C 


CCCE  LIST.  FOR  2»..2»_3 _ 

223  NOT  ON  F-LIST, 

COCE  LIST  FOR  2*37  3 
— 233 — a - « - QQQ2C 


CODE  LIST  FOR  3*  1* 
0  0  3 1 T  ■  X  ** 

_.0032T  a  0003C  _* _ 

0  3 1  ■  ♦ 

0  _  _ 

0002C 

003.lT— 

0032T 

CODE  LIST  FOR  3»  2. 

0 

-  032 

9  •» 

..  OOOiC — 

.CODE  .LiS.T_F0R-.-3*  3» 

0 _ 

033 

NOT  ON  F-LIST. 

COOE  LIST  FOR  3,  1, 

1 

.567  IT 

a  0002C  « 

X 

5672T 

a  0003C  * 

5671T 

-  Ill 

a  ♦ 

5672T 

CODE  LIST  FOR  3*  1a 

2 

312 

NOT  ON  F-LIST, 

CODE  LIST  FOR  3.  1« 

3 

— 3l3 — 

N  QT-ON-~F  -  CIS  T  • 

CODE  LIST  FOR  3i  2,  2 
_  322.  NOT.  ON-F-LIST • _ 

CODE  LIST  FOR  3,  2,  3 
323  NOT  ON  F-LIST. 

COOETISTfOR  37  3  «~3 
313 _ NOl-JPH ..  F»JL  I  ST.. _ 
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ITERATION  number  1 


NOBK 

F  ■  1,70000000*001 

X 

a  1.00000000*000 

FI 

■  1.70000000*001 

Y 

a  1,00000000.000 

F2 

a  0.00000000*000 

z  _ 

a  1,00000000.000 

_ E3 

a  0.00000000*000 

TIME  PtR  ITERATION  a _ 64.00  MlLLlSECONOS 


ITERATION  NUMBER  2 
~NCRM  F  ■  4*79191 714*000 

.NORM  CiL_a.__i.fl 3333333*15.1 _ 

BOUND  FPRIME  INVERSE  ■  5*50000000-001 

.BOUND  F  DBL  PRIME  ■  9, .1H£l0.0.0Q«JDO2 _ _ _ _ _ T _ 

HO  ■  1.51463767*002 

_X  _  i«  9.29166667-001 _ Fl _ ■ _ 4.79191714*000 _ 

Y  a  7.87500000-001  F2  ■  1.30451389-001 

.1 _ «  1.28333333*000 _ F3 _ 5 _ 1 .46966869-QP2  . 

TIKE  PtR  ITERATION  ■  1928.00  MILLISECONDS _ 


ITERATION  NUMBER  3 
NORM  F  ■  6,45309522-001 

_NC «K_  C  A_a _ 9L.  4324 14$_7-_M2 _ 

BCUNO  FPRIME  INVERSE  a  5.56439198-001 

. BOUND  F  DBL  PRIME  ■  4*Aifl56fl!fl*D02 _ 

HC  ■  2.35585457*001 

_X _ •  -  8.87074529-001 - F_1 - ■ _ 6.4530V522-OILl-_ 

Y  a  6.93175859-001  F2  ■  1.20773905-002 

1 _ _a _ l,32QB6464*aQQ _ F3 _ 1 _ 4.86417094-Q.Q3— 

TIkE  ptR  ITERATION  a  _ 1943 .0.0  _M_lLLlS.E_CflflDs _ _ 


ITERATION  NUMBER  4 
NORM  F  a  1,84509452-002" 

_N0&K_  CX_t _ 1j  5 9 A 1 1 520-002 _ 

8CUN0  FPRIME  INVERSE  a  6,34254336-001 

_8CUND  F  J)BL  PRIME  ■  2.85088868*002 _ 

HO  ■  2.88969353*000 

X  _ 8,78244398-OQI _ El _ ■  1.84509452-002 

Y  =■  6.77194707-001  F2  ■  4.28336556-004 

2  a  1 .33060980*000 _ E3  ■  2.06810364-004 

TlKE_J?kR_ I TERATION  b  1767.00  MILLISECONDS _ 
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iteration  number 


5 


NORM  F  ■  1.47977044-005 

NORM  CA  ■  4.17403606-004 _ _ 

BCUND'KPrIME  INVERSE  *  6.61437002-001 

BOUND  F  _D0L  PRIME  *  2.57938S53*-Q02_  _  .  ..  _ 

MO  ■  7.46256805-002 

ERROR  ■ _ 6.52030035-005 _ _ 

X  =■  8.77965993-001  FI  «  1.47977844-005 

Y _ l« _ 6.76  7573  04-001 _ F2 _ ■ _ 3 . 29047907-007 

Z  ■>  1,3300552 1*000  F3  ■  2,04207026-007 

T  I*E  PErTTERATI  0"N~i  1  fl 5 3V0  0  M I L LISE C 0 ND S 


SUCCESSFUL  CONVERGENCE  AT  ITERATION  iMJMBER  5 

WITH  ERROR  «  3, 79 255563-0 ITTe S S  THAN  1.00000000-008 

Jk _ 4  _  8.77965760-001 _ Fl_..« _ 4.65661287-0L11L. 

Y  6.76756971-001  F2  ■  5.82076609-011 

.2 - is _ UJ3fl85541tQ0Q _ F3  ■  0.00000000*000 


APPENDIX  II 


Listing  of  NEWTON  and  Relevant  Subroutines  (July,  1967),  not  including 


CODEX  and  INTERVAL . 
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EXPERIENCE  WITH  FORMAC  AT 
HARRY  DIAMOND  LABORATORIES 

David  S.  Marsh 
Harry  Diamond  Laboratories 
Washington,  D.  C. 

[ABSTRACT.  FORMAC  is  an  experimental  language  and  compiler, 
written  by  IBM,  which  allows  the  manipulation  of  algebraic  symbols  in  • 
much  the  same  way  that  FORTRAN  manipulates  numerical  values.  It 
incorporates  such  FORTRAN  features  as  subscripting  and  the  DO  loop 
capability.  FORTRAN  statements  can  be  included  in  a  FORMAC  program 
so  that  results  can  be  derived  symbolically  and  evaluated  numerically  in 
the  same  program.  FORMAC  is  particularly  useful  in  those  long,  tedious 
algebraic  problems  which  are  so  subject  to  copying  and  other  errors  when 
done  with  pencil  and  paper. 

The  paper  describes  several  small  practice  problems  with  which 
programmers  became  familiar  with  the  language,  its  operation,  and  some 
of  the  commands.  One  larger  problem  is  included,  that  of  forming  the 
determinant  of  a  matrix,  the  elements  of  which  are  algebraic  expressions.  ] 

FORMAC  (FOrmular  MAnipulation  Compiler)  is  a  combination  of  a 
compiler  and  a  language  which  makes  possible  the  manipulation  of  algebraic 
symbols  as  symbols,  according  to  the  rules  of  algebra,  in  the  computer. 
FORTRAN,  in  comparison,  performs  in  much  the  same  manner  with 
numbers. 

FORMAC  was  written  by  IBM's  Boston  Advanced  Programming  depart¬ 
ment  at  Cambridge,  Massachusetts.  It  is  still  an  experimental  system 
and  was  released  unofficially  for  tests  under  actual  operating  conditions 
and  to  find  out  just  what  capabilities  the  computing  community  thought  such 
a  system  should  have. 

Actually,  FORMAC  for  the  IBM  7090/7094  is  no  longer  being  developed 
by  IBM  since  they  are  working  on  software  (including  an  improved  FORMAC) 
for  the  360  series  computers.  Under  the  auspices  of  SHARE,  however,  a 
group  at  Wright-Patterson  Air  Force  Base  is  taking  over  the  further 
development  of  this  system^. 

FORMAC  has  been  available  at  Harry  Diamond  Laboratories  since 
early  in  1966.  Some  of  our  uses  of  it  will  be  presented  here. 


^Proceedings  of  SHARE  XXVIII,  p  4-93. 
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This  paper  is  not  a  detailed  tutorial  discourse  on  FORMAC.  It  is, 
rather,  a  brief  description,  with  some  simple  examples,  of  the  language 
and  its  use.  Hopefully,  with  this  information  you  may  be  able  to  judge 
for  yourselves  whether  FORMAC  would  be  of  value  to  you  in  your  own 
operations. 

FORMAC  was  written  as  an  addition  to  and  extension  of  FORTRAN. 
FORTRAN  and  FORMAC  statements  may  be  intermixed  in  a  program.  A 
FORMAC  program  goes  through  a  pre-processor  which  translates  FORMAC 
statements  into  FORTRAN  "CALL"  statements.  The  program  then  goes 
to  the  FORTRAN  compiler.  During  execution  as  a  FORTRAN  program, 
the  former  FORMAC  statements  call  special  subroutines  (added  to  the 
FORTRAN  library)  to  accomplish  their  purposes. 


LET 

+ 

SUBST 

- 

EXPAND 

COEFF 

/ 

PART 

3|C  J|C 

ORDER 

FMCEXP 

EVAL 

F  MC  LOG 

FIND 

MATCH 

FMCSIN 

CENSUS 

FMCCOS 

BCDCON 

FMCATN 

ALGCON 

FMCHTN 

ERASE 

AUTSIM 

FMCFAC 

FMCDMP 

FMCDFC 

ATOMIC 

FMCOMB 

DEPEND 

PARAM 

FMCDIF 

SYMARG 

Figure  1.  FORMAC  Commands 

Figure  1  shows  a  list  of  available  commands  which  gives  a  fair  idea 
of  FORMAC 's  capabilities.  There  are  fifteen  operators,  which  perform 
the  purely  mathematical  functions,  and  nineteen  declarative  and  executable 
statements  used  to  define  terms  at  the  beginning  of  the  program,  mani¬ 
pulation  expressions  in  various  ways,  and  for  various  "housekeeping" 
purposes  during  the  run.  The  mathematical  operators  are  largely  self- 
explanatory;  most  have  direct  FORTRAN  counterparts.  Among  those 
which  don't  are  FMCFAC  and  FMCDFC  which  perform  the  factorial  and 
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double  factorial  functions,  respectively.  Similarly,  FMCOMB  performs 
the  combinatorial  function.  FMCDIF  performs  differentiation. 

The  use  of  many  of  the  declarative  and  executable  statements  is 
illustrated  in  program  listings  later  in  the  paper. 

Two  obvious  omissions  from  the  mathematical  operator  list  are 
commands  for  integrating  and  factoring.  There  exist  no  general  algorithms 
for  these  processes. 

In  general,  FORMAC  seems  best  suited  to  performing  relatively 
simple  mathematical  operations  on  relatively  large  and  complicated 
algebraic  expressions.  The  sample  problems  will  illustrate  this  and 
show  how  some  of  the  commands  are  used. 

The  first  two  problems  are  the  generation  and  differentiation  of  the 
Lagrange  interpolation  formula.  During  the  application  of  the  Method  of 
Steep  Descent,  it  is  desirable  to  find  the  value  of  X  corresponding  to  the 
minimum  point  on  a  parabola  passed  through  three  known  points.  Given 
three  points,  the  Lagrange  formula  (Fig.  2)  yields  the  value  of  Y,  lying 
on  the  parabola  which  passes  through  the  known  points,  for  any  value  of 
X.  Differentiating  the  formula,  setting  the  results  equal  to  zero,  and 
solving  for  X  gives  an  expression  which  locates,  in  X,  the  minimum 
point  of  the  parabola. 


'(X-X2)(X-Xj) 

x  V 

"(X-X1)  (X-X3)  " 

4-  V 

(X-Xj)  (x-x2) 

jxrx2)(xrx3) 

+  Y2 

|_(x2-x1)<x2-x3)J 

+  Y3 

(x3-x2)(x3-x1) 

Figure  2.  Lagrange  Interpolation  Formula 


The  pattern  of  the  subscripts  in  the  formula  suggests  the  operation 
of  two  nested  DO  loops.  The  inner  loop  would  manipulate  the  subscripts 
within  a  term  while  the  outer  loop  would  multiply  in  an  appropriately 
subscripted  Y  and  sum  up  the  expression.  These  loops  formed  the  basis 
of  the  program  to  generate  the  formula.  (See  Figure  3.) 

Figure  3  also  shows  intermediate  results  at  the  end  of  the  first  and 
second  executions  of  the  inner  loop,  the  first  execution  of  the  outer  loop, 
and  the  complete  expression  at  the  end  of  the  third  and  last  execution  of 
the  outer  loop.  These  results  show  XX  for  the  subscripted  X  of  the 
formula  and  W  for  the  subscripted  Y  terms.  Figure  4  shows  the  final 
form  of  the  results  with  X  substituted  for  XX  and  Y  for  W.  It  also  shows 
the  effect  of  a  special  output  subroutine,  supplied  by  IBM,  which  yields 
a  format  closer  to  normal  algebra  than  the  standard  FORMAC  output. 
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Y  =  AX  +  BX  +  C 

^  =  2  AX  +  B  =  0 
dX 

X  — 2- 
X  '  2A 


Figure  5.  Operations  Performed  on  Lagrange  Formula 


The  second  problem  is  to  perform  on  the  generated  Lagrange  formula 
the  operations  shown  in  Figure  5.  Figure  6  shows  the  program,  and 
Figure  7  shows  the  results  before  and  after  substituting  Y  for  W  and  X 
for  XX.  Notice  that  the  division  of  the  coefficients  in  the  answer  is  only- 
implied  by  enclosing  the  denominator  in  parentheses  and  raising  it  to  the 
-1  power. 

The  third  problem  is  an  example  of  the  type  of  simple  mathematics 
mentioned  earlier;  forming  the  determinant  of  a  3x3  matrix^.  Figure  8 
shows  the  elements  of  the  matrix;  each  is  an  algebraic  expression.  Not 
only  that,  but  each  of  the  42  underlined  terms  represents  another 
algebraic  expression  which  must  be  substituted.  Before  the  substitution 
expressions  are  put  into  the  matrix  elements,  however,  there  are  substi¬ 
tutions  to  be  made  among  themselves. 

The  sequence  of  operations  to  be  accomplished  is; 

1.  Make  the  substitutions  among  the  substitution  terms. 

2.  Put  the  new  substitution  terms  into  the  matrix  elements. 

3.  Set  B  equal  to  zero,  a  condition  of  the  original  problem. 

4.  Form  the  determinant. 

This  is  exactly  the  type  of  "dog-work"  which  is  so  subject  to  error 
and  thus  so  frustrating  when  done  by  hand.  If  N  people  do  such  a  job, 
with  a  requirement  for  accurate  final  results,  there  are  usually  at  least 
N  different  results  to  be  reconciled.  This  is  also  just  the  type  of  problem 
for  which  FORMAC  was  created. 

Figure  9  shows  the  factors  before  and  after  their  internal  substi¬ 
tutions.  Only  eight  of  the  original  nine  terms  are  of  further  interest  since 
the  FI  term  appears  only  in  the  others  and  not  in  the  matrix  elements, 
but  the  remaining  eight  are  larger. 


2 

Generation  of  the  matrix  is  described  in  HDL  TR-1316,  "An  Equation 
for  Phase  Velocities  in  a  Partially  Ionized  Gas",  H.  D.  Curchack  and 
F.  T.  Harris,  Harry  Diamond  Laboratories ,  Washington,  D.  C.  20438. 
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Figure  10  shows  the  matrix  elements  after  the  substitution  of  the 

enlarged  factors.  Where  they  could  originally  be  printed  on  19  lines, 

they  now  cover  99  lines.  Three  of  the  matrix  elements  go  to  zero  when 

B  is  set  to  zero  {Figure  11)  and  they  are  so  located  in  the  matrix  (Figure 

12)  that  a  fourth  element 
'  32 


A11 

A12  =  0 

A13 

o 

II 

< 

A22 

A23 

A31 

A32 

A33 

Figure  12.  Location  of  Zero  Elements 

is  eliminated  from  the  determinant.  The  determinant  is  now  the  differ¬ 
ence  between  the  products  along  the  two  major  diagonals  (Figure  13). 

It  is  unquestionably  messy,  but  cleaning  it  up  by  hand  is  certainly  far 
easier  than  obtaining  it  from  the  original  matrix  by  hand. 

This  has  been  a  brief  description  of  some  of  the  capabilities  of 
FORMAC.  In  areas  for  which  it  is  suitable,  it  can  be  a  very  useful  tool. 
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Figure  11.  Matrix  Elements  After  B  Set  to  Zero 
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A  SIMPLE  ELECTRONIC  TRUE  RANDOM  EVENT  GENERATOR 

D.R.  Koehler,  J.T.  Grissom,  and  R.G.  Polk 
U.S.  Army  Missile  Command,  Redstone  Arsenal,  Alabama 


ABSTRACT .  A  device  is  proposed  which  will  generate  a  uniform  series 
of  random  binary  digits.  This  device  could  be  considered  an  electronic 
equivalent  of  a  coin-flipping  machine  in  that  its  output  is  a  continuous 
series  of  binary  digits  with  successive  digits  having  exactly  equal  proba¬ 
bilities  of  being  ,!1"  or  '0“.  Such  a  device  would  be  ideally  suited  to 
the  on-line  production  of  random  numbers  for  use  in  Monte  Carlo  calculations 
by  digital  computers.  With  suitable  combinatorial  logic,  generation  of 
random  pulses  or  random  analog  signals  could  easily  be  accomolished.  The 
device  as  presently  conceived  is  small,  compact,  uncritical,  and  requires 
little  power.  Using  the  space-randomness  of  particle  emission  from  a 
radioactive  source  and  two  small  semi-conductor  detectors  as  a  signal 
generator,  plus  a  few  readily  available  integrated  micro-circuit  packages, 
the  device  could  be  packaged  on  a  medium-sized  circuit  board.  Interfacing 
to  any  of  the  present  generation  of  digital  or  hybrid  computers  would 
present  no  problems,  and  the  bit  generation  rate  could  be  adjusted  to 
satisfy  the  demand  rate  of  the  fastest  of  today's  computers. 

Computer  technology  presently  has  reached  such  a  state  of  development 
that  today  computer  systems  are  being  built  which  are  so  large  that  seemingly 
the  necessary  software  and  programs  to  utilize  them  cannot  be  produced.  The 
burgeoning  field  of  computer  systems  application  is  working  overtime  searching 
for  ways  and  means  to  fully  occupy  the  vast  capabilities  of  the  very  large 
computer  systems,  and  problems  which  seemed  impossible  of  solution  by  any 
computer  technique  a  few  years  ago  are  beginning  to  yield  to  new  approaches 
made  possible  by  these  large  new  machines.  In  particular,  one  long-popular 
but  computationally  expensive  numerical  technique  known  as  the  "Monte  Carlo 
calculation  '  is  seeing  a  period  of  rapid  development  as  a  line  of  attack  on 
problems  which  would  not  yield  to  ordinary  analytical  and  numerical  tech¬ 
niques.  The  long-standing  problem  with  most  Monte  Carlo  programs  is  their 
requirement  for  random  numbers  in  large  quantities. 

Computer-users  in  the  areas  of  statistical  sampling  and  simulation, 

Monte  Carlo  calculations,  and  the  promising  new  field  of  "stochastic" 
computation  so  far  have  been  steadily  handicapped  by  the  difficulty  of 
obtaining  high-quality  random  numbers  for  their  programs.  In  particular, 
the  stochastic  computer  requires  numbers  in  great  quantity  and  of  high 
quality,  and  speed  of  computation  is  directly  dependent  on  the  rate  at  which 
random  numbers  can  be  provided  to  the  computer.  Computer  designers  so 
far  seem  to  have  virtually  ignored  this  problem  altogether,  leaving  it  up 
to  the  programmers  to  somehow  devise  a  technique  of  getting  numbers. 

The  common  techniques,  up  to  this  time,  have  been  the  insertion  of  tables 
of  random  numbers  in  the  computer  memory,  or  the  calculation  of  "pseudo-random" 
numbers  arithmetically  via  a  short  in-computer  program  using  any  one  of  quite 
a  number  of  possible  algorithms.  Both  of  these  approaches  suffer  from  requiring 
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memory  space,  and  both  are  limited  in  the  quantity  and  quality  of  numbers 
which  can  be  supplied.  Furthermore,  algorithmic  solutions  require  non- 
negligible  amounts  of  computer  time.  The  real  solution  to  the  problem 
will  come  when  a  good  random  number  generator  can  be  built  which  will  pro¬ 
duce  all  manner  of  random  numbers  any  program  or  computer  may  require  and 
which  can  be  hooked  up  to  the  computer  directly. 

Attempts  have  been  made  to  construct  random  number  devices,  and  their 
history  makes  Interesting  reading.  But  the  end  product  of  most  of  these 
attempts  seems  generally  to  have  been  slow  in  speed,  cumbersome,  unwieldy, 
and  unsuited  for  direct  connection  to  the  computer;  or  else  complicated, 
sophisticated,  lacking  stability,  and  requiring  much  careful  adjustment 
and  attention.  We  shall  not  take  time  to  discuss  any  of  these  devices 
here.  The  interested  reader  will  find  references  on  some  of  these  devices 
in  the  bibliography. 

We  propose,  as  have  many  others  interested  in  this  problem,  a  device 
based  upon  the  random  nature  of  the  decay  of  radioactive  substances.  How¬ 
ever,  instead  of  mixing  radioactivity  detectors  with  clock-pulse  generators 
and  observing  the  time-randomness  of  emission  of  nuclear  particles,  as  has 
been  the  traditional  approach,  we  would  like  to  combine  two  reasonably 
identical  and  independent  nuclear  detector  systems  whose  average  count  rates 
are  exactly  equal.  The  time-and-space  randomness  of  the  decay  of  the 
radionuclide  then  requires  that  at  any  given  instant  of  time  there  be 
exactly  equal  probabilities  that  either  detector  will  receive  the  next 
particle.  If  one  detector  were  labeled  "heads"  and  the  other  "tails",  the 
output  pulses  of  the  two  detectors  would  be  just  as  good  for  decision  making 
as  the  ubiquitous  coin,  and  much,  much  faster. 

The  proposed  device  is  shown  schematically  in  Figure  1.  The  "sandwich" 
of  detectors  and  radioactive  source  can  be  made  quite  compact.  It  could  be 
fitted  on  one  corner  of  a  single  printed-circuit  board,  or  even  on  a  single 
chip  of  silicon  which  at  the  same  time  could  carry  some  of  the  necessary 
active  electronics.  The  source  strength  even  for  very  high  count  rates 
could  be  relatively  weak  and  quite  harmless  -  less  damaging  than  an  ordinary 
radium  watch  dial.  Using  ordinary  silicon  semiconductor  radiation  detectors, 
the  device  could  be  made  to  pump  out  random  binary  bits  at  a  rate  fast 
enough  even  for  the  "stochastic"  computers:  and  as  computer  technology 
advances,  the  permissible  bit  generation  rate  can  advance  with  it,  since 
virtually  all  the  associated  electronics  can  be  digital  and  will  benefit 
from  improvements  in  digital  techniques. 

The  "sandwich"  of  Figure  1  is  not  exactly  a  proper  configuration  for 
direct  connection  to  any  user  device,  such  as  a  computer.  First  of  all, 
the  detector  signals  are  small  and  must  be  amplified.  Then  some  means- 
must  be  incorporated  to  convert  the  amplified  detector  pulses  to  the 
necessary  logic  levels  for  feeding  the  user  device.  In  Figure  2  we  see 
a  possible  realization  of  a  generator  of  serial  binary  bits.  The  "conversion" 
flip-flop  is  triggered  by  the  detector  pulses  into  "1"  or  "0"  states  and 
thus  provides  logic  levels  representing  the  two  binary  digits.  These  digits 
are  produced  one  after  the  other  in  serial  fashion  by  "inspecting"  the  logic 
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levels  every  time  a  detector  pulse  appears  at  the  "clock"  output  and 
delivering  to  the  user  device  the  proper  binary  bit  as  determined  by  the 
state  of  the  flip-flop. 

For  a  random  pulse  generator,  or  some  sort  of  special  noise  generator, 
this  configuration  might  serve  admirably.  But  a  computer  likes  its  input 
to  be  more  regular,  the  time-randomness  of  binary  output  of  this  serial 
generator  would  be  unacceptable  to  the  computer  systems  designer.  There¬ 
fore  some  sort  of  buffer  memory  must  be  incorporated.  Possibly  the 
easiest  solution  to  this  problem  is  the  addition  of  a  shift  register  which 
is  driven  by  the  outputs  of  the  serial  binary  generator.  This  is  shown 
in  Figure  3.  Any  time  the  computer  desired  a  new  random  number,  it  could 
sample  the  state  of  the  shift  register  and  transfer  its  contents  via 
parallel-access  lines  to  the  processor,  or  else  the  transfer  of  digits 
from  the  generator  to  the  register  could  be  temporarily  halted  while  the 
number  contained  in  the  register  is  clocked  cut  at  the  computer  clock  rate 
and  fed  to  the  computer  serially  from  the  back  of  the  shift  register. 

The  ultimate  choice  of  means  of  converting  the  detector  "sandwich'1 
pulses  into  numbers  in  the  computer  will  be  up  to  the  computer  designer. 

Our  suggestions  are  only  for  illustrating  the  possibilities.  For  the 
sake  of  simplicity,  we  have  so  far  ignored  one  vety  important  additional 
element  of  the  total  generator.  That  element  consists  of  the  means  by 
which  the  generator  is  stabilized  so  as  to  maintain  the  exactly  equal  count 
rates  we  presupposed  as  the  necessary  condition  for  true  randomness.  For 
certain  types  of  radioactive  sources  and  preamplifiers,  this  "stabilization" 
can  be  so  simple  as  a  micrometer  adjustment  of  the  position  of  the  source 
between  the  detectors  -  the  inherent  counting  stability  of  the  remainder 
of  the  system  will  be  high  enough  that  over  periods  of  perhaps  a  year  or 
more  between  maintenance  checks  the  drift  and  count  rate  inequality  will 
be  quite  negligible. 

Unfortunately,  the  type  of  source  presupposed  above  could  be  rather 
"hot '  as  radioactive  sources  go,  and  might  prove  something  of  a  problem 
around  a  computer.  Also  the  adjustment  mechanism  would  be  somewhat  bulky 
relative  to  the  size  of  the  rest  of  the  system.  A  better  approach  probably 
would  be  the  use  of  feedback  stabilization.  For  example,  in  Figure  4  we 
have  added  an  up-down  scaler  which  continually  measures  the  difference  in 
the  number  of  "l's"  and  the  number  of  "0's,"  and  if  the  difference  exceeds 
a  certain  value,  to  be  determined  by  statistical  considerations,  then  an 
adjustment  of  the  count  rate  in  the  one  channel  would  be  made  via  the  second 
up-down  scaler,  DAC,  and  discriminator.  This  sort  of  stabilization  scheme 
is  basically  digital,  with  a  step-wise  adjustment  of  the  relative  count 
rates,  which  should,  after  a  stabilization  period,  lead  to  a  steady-state 
condition  in  which  the  statistical  probabilities  of  the  two  binary  states 
fluctuate  very  slightly  about  the  exact  50%  level. 

Having  conceived  the  device,  we  naturally  are  curious  as  to  just  how 
good  it  might  be.  Unfortunately,  it  is  not  within  our  mission  to  do  device 
development  such  as  this,  so  we  have  not  been  able  to  obtain  and  patch-up 
the  necessary  logical  elements  to  test  it.  However,  some  spare  detectors, 
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amplifiers,  and  a  paper  tape  punch  were  temporarily  rigged  to  punch  random 
bits  in  paper  tape.  The  system  had  no  provision  for  stabilization,  and 
count  rates  were  crudely  adjusted  to  something  near  equality  in  both  channels 
simply  by  adjusting  channel  gains.  Something  over  10*  bits  were  punched 
out,  which  we  converted  to  card  and  then  gave  to  our  Computation  Center  for 
testing.  Considering  the  small  sample  we  had  to  work  with  and  the  consequent 
rather  large  variance  to  be  expected  on  any  given  test,  no  real  conclusions 
could  be  developed  as  to  the  quality  of  the  numbers.  All  results  of  all 
tests,  however,  were  within  statistical  expectations  based  upon  the  known 
relative  numbers  of  ones  and  zeros  and  otherwise  assuming  complete  random- 
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PROGRAMMING  INTERVAL  ARITHMETIC  AND  APPLICATIONS 


Allen  Reiter 

Lockheed  Missiles  and  Space  Company 
Palo  Alto,  California 

INTRODUCTION.  This  paper  discusses  the  current  state-of-the-art  in 
interval  arithmetic,  both  from  the  programming  point  of  view  and  from  the 
point  of  view  of  applications  to  date. 

Interval  arithmetic  was  first  developed  formally  by  R.E.  Moore  around 
1960,  although  there  is  essentially  nothing  new  in  the  concepts  involved. 
Moore  originally  envisioned  interval  arithmetic  as  a  means  of  completely 
rigorous  automatic  error  control  for  computational  processes  using  a  digital 
computer.  More  recently,  people  have  begun  to  appreciate  the  potential  of 
interval  arithmetic  for  control  theory,  and  also  as  a  tool  in  experimental 
designing  on-line,  with  both  a  man  and  a  computer  as  parts  of  the  feedback 
loop. 


There  are  basically  three  different  sources  of  error  associated  with 
numerical  computations.  The  first,  which  we  may  call  the  data  problem,  is 
due  to  the  fact  that  the  value  of  some  given  parameter  may  not  be  known 
exactly  (this  is  for  example  true  for  physically-determined  parameter 
values),  or  else  may  not  be  exactly  represented  in  a  computer  (for  example, 
the  number  tt) .  A  second  type  of  error,  usually  called  truncation  error, 
is  caused  by  the  necessity  to  terminate  after  a  finite  number  of  steps 
some  infinite  converging  process,  or  (equivalently)  by  the  requirement 
that  some  well-defined  expression  be  evaluated  at  some  point  whose  location 
is  known  only  approximately  (for  example,  the  remainder  term  of  the  Taylor 
series  with  remainder) .  The  third  type  of  error  is  round-off  error,  caused 
by  the  necessity  to  restrict  computational  processes  to  operate  on  numbers 
which  do  not  exceed  some  predetermined  number  of  digits  in  length.  Round¬ 
off  error  has  traditionally  been  the  most  troublesome,  primarily  because 
of  its  non-analyticity .  Attempts  at  rigorous  "pencil-and-paper"  bounding 
of  round-off  either  are  too  difficult  or  lead  to  hopelessly  pessimistic 
"bounds". 

Interval  arithmetic  keeps  track  of  the  accumulation  of  error  by 
continually  producing  an  interval,  guaranteed  to  contain  the  "true"  result, 
and  performing  the  indicated  arithmetic  operations  on  the  entire  interval. 
Since  the  implementation  of  interval  arithmetic  necessarily  involves  ordi¬ 
nary  arithmetic  operations  on  the  end-points  of  the  interval,  which  in  turn 
involve  rounding,  care  must  be  taken  to  perform  the  rounding  properly:  "down" 
for  the  left-hand  end  point,  and  "up"  at  the  right-hand  one.  Thus,  when 
in  the  sequel  we  shall  speak  of  interval  arithmetic,  it  shall  be  understood 
that  in  the  implementation  of  the  operations  on  a  computer  rounded  interval 
arithmetic  is  used.  However,  in  the  formal  discussion  of  interval  arithmetic 
we  shall  ignore  this  fact,  and  define  the  formal  operations  independently 
of  their  implementation. 

ARITHMETIC  RULES.  An  interval  is  simply  a  closed  interval  on  the  real 
line,  of  the  form  [a,b]  .  We  can  also  think  of  an  interval  as  a  fuzzy  number 
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x  of  the  form  [x-e  ,  x+e  ];  although  e  is  certainly  not  restricted  to  being 
small  in  any  sense.  The  arithmetic  operations  are  defined  in  a  natural 
fashion,  and  in  fact  reduce  to  ordinary  arithmetic  when  e=0.  (When  the 
occasion  arises,  we  shall  speak  of  ordinary  real  numbers  as  degenerate 
intervals. ) 

Elementary  operations  are  defined  as  follows.  Let  [a,b]  and  [c,d] 
be  a  pair  of  intervals.  Then 

[a,b]  +  [c,d]  =  [a+c,b+d]  ; 

[a,b]  -  [c,d]  =  [a-d,b-c]  ; 

[a,b]  *  [c,d]  =  [min(ac,ad,bc,bd) ,  max(ac,ad,bc,bd)]  ; 

[a,b]  /  [c,d]  *  [a,b]  *[l/d,l/c  ];  (division  is  defined 
only  if  the  interval  [c,d]  does  not  contain  the  point  zero). 

It  can  be  seen  that  these  operations  are  defined  in  such  a  way  that 
the  result  is  precisely  the  set  of  all  possible  values  of  the  operation 
as  the  operands  range  over  the  argument  intervals. 

Interval  arithmetic  is  associative,  and  addition  and  multiplication 
are  commutative.  Unfortunately,  the  distributive  law  does  not  hold;  Instead 
we  have  the  "subdistributive"  law  (I,  J,  and  K  being  intervals): 

I  *  (J  +  K)C  I  *  J  +  I  *  K. 

That  the  inclusion  can  Indeed  be  proper  can  be  seen  from  the  example 

[-3,3]  *  [0,2]  +  [-3,3]  *  [-1,0]  *  [-6,61  +  [-3,3]  =  [-9,9]  , 

whereas 

[-3,3]  *  ([0,2]  +  [-1,0])  *  [-3,3]*  [-1,21  =  [-6,6]  . 

The  example  also  illustrates  that  a  given  interval  number  may  have 
many  multiplicative  units:  if  y  is  any  real  number  in  (-1,1),  then  all 
interval  numbers  of  the  form  [-l,y]  or  of  the  form  [y , 1 ]  are  multiplicative 
units  for  the  interval  number  [—3,3 ]  . 

More  disruptive  is  the  fact  that  although  an  additive  unit  is  unique 
(  [0,0]  ),  interval  numbers  do  not  in  general  possess  additive  inverses. 

(This  reflects  the  fact  that  once  uncertainty  or  error  has  been  introduced 
into  a  computational  process,  it  cannot  be  cancelled  out,  but  must  be  carried 
along  till  the  end.)  This  last  property  is  responsible  for  almost  all  of 
the  difficulties  in  interval  arithmetic,  and  frequently  necessitates  very 
delicate  handling  of  the  specification  of  a  computational  algorithm  -  some¬ 
thing  that  the  current  state-of-the-art  is  not  quite  up  to.  (In  spite  of 
this  handicap,  useful  areas  of  application  have  already  been  found.) 
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The  usefulness  of  interval  arithmetic  for  error  bounding  comes  from 
the  fact  that 

1)  The  elementary  arithmetic  operations  are  continuous  mappings 
from  1^  x  I2  onto  1^  (the  I's  are  arbitrary  intervals); 

2)  Since  the  elementary  operations  are  defined  in  such  a  manner 
that  the  range  of  the  operator  as  the  operands  range  over  the  argument 
intervals  is  contained  in  the  result  interval,  the  same  is  true  for  any 
well-defined  grouping  of  such  operations  on  argument  intervals;  in  other 
words,  for  all  rational  functions.  Of  course,  rational  operations  is  all 
computers  are  capable  of  executing;  thus,  any  computable  function  can  be 
bounded  by  the  use  of  interval  arithmetic. 


Let  f(x^,...xn)  be  a  given  formal  rational  function  in  the  indeter- 

minates  x,,...x  .  When  the  indeterminates  take  on  real  values,  f  denotes 
I  n 

a  real-valued  function.  There  may  be  many  different  ways  of  representing 
this  function,  which  are  all  algebraically  equivalent;  we  will  fix  a 
representation  f ^(x^, . . .x^) .  If  we  let  the  indeterminates  take  on  interval 

values  X^,...Xn>  then  the  function  f^  is  still  well-defined  (we  can  regard 

f^  as  a  computer  program,  with  a  sequence  of  arithmetic  operations  to  be 

carried  out  in  a  certain  order) ;  we  however  choose  to  call  this  interval¬ 
valued  function  F^CX^, .  .  .X^) .  Note  that  the  fact  that  ^  and  f2  may  be 

algebraically  equivalent  to  f  (and  to  each  other)  certainly  does  not  imply 
that  and  F2  are  equivalent  (this  is  primarily  due  to  the  failure  of 

the  cancellation  law  for  interval  arithmetic).  The  basic  theorem  of 
interval  arithmetic  however  states  that  for  the  purposes  of  error  bounding 
any  representation  will  do: 


Theorem.  Let  f  be  a  given  rational  function,  f  *  f(x^,...xn),  and 

let  F  be  any  representation  of  f,  F  to  be  evaluated  in  interval  arithmetic. 

Let  X.,...X  be  a  collection  of  closed  intervals  on  the  real  line. 

1  n 

Then  the  range  of  f  as  each  variable  x^  ranges  over  X^  is  contained  in 


F(X1,...Xn) 


The  theorem  assures  us  that  interval  arithmetic  is  sufficient  to 
compute  bounds  on  the  range  of  a  rational  function  over  a  compact  rectangle 
in  En»  Note  that  since  the  evaluation  of  F  can  be  done  using  rounded 

interval  arithmetic,  the  round-off  error  is  included  in  the  final  bounds 
produced  by  F.  (It  is  worth  while  stressing  though  that  nothing  is  said 
about  bounding  the  round-off  that  might  occur  in  evaluating  f.  The  round¬ 
off  process  is  not  a  continuous  operation.  On  some  computers,  in  particular 
on  the  IBM  SYSTEM/ 360,  it  is  easy  to  cook  up  examples  where  f  evaluated 
at  some  point  p  inside  the  rectangle  turns  out  to  be  outside  the  interval 
obtained  by  evaluating  F.  This  is  but  another  aspect  of  "dirty"  floating¬ 
point  hardware.  The  true  range  of  f  is  however  always  contained  in  F.) 


As  already  noted,  the  width  of  the  interval  obtained  by  evaluating 
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F  may  be  considerably  greater  than  the.  width  of  the  true  range  of  f;  it  is 
also  generally  quite  sensitive  to  the  choice  for  the  particular  representation 
F.  This  shall  be  discussed  below. 

SOME  APPLICATIONS  OF  INTERVAL  ARITHMETIC.  Aside  from  the  obvious 
advantage  of  providing  error  bounds,  interval  arithmetic  can  be  used  by  a 
computer  to  control  the  growth  of  error.  While  potentially  the  realm  of 
applications  is  unlimited,  the  author  knows  only  of  the  following  contexts 
in  which  interval  arithmetic  has  been  studied: 

a)  The  initial-value  problem  for  ordinary  differential  equations; 

b)  Finding  roots  of  polynomials; 

c)  Matrix  inversion,  and  the  eigen-value  problem  for  matrices; 

d)  Solution  of  systems  of  simultaneous  (non-linear)  equations; 

e)  The  two-point  boundary-value  problem. 

In  these  areas,  analytic  techniques  are  being  developed  which  make  use  of 
interval  arithmetic  evaluations,  and  which  also  address  themselves  to  the 
peculiar  problems  which  arise  in  using  interval  arithmetic. 

THE  INITIAL  VALUE  PROBLEM  FOR  ORDINARY  DIFFERENTIAL  EQUATIONS.  Let 
dy/dx  =  f(x,y)  denote  a  system  of  n  first-order  ordinary  differential 
equations,  and  let  y^  =  y(x^)  Siven*  The  application  of  interval  arith¬ 
metic  to  the  automatic  generation  of  solutions  to  this  problem  was  the  first 
application  suggested  by  Moore.  He  designed  a  computer  program  using 
interval  arithmetic  which  gave  solutions  with  automatic  error  bounds. 

His  method  is  described  in  [7]  .  Briefly  stated,  the  solution  is 
expanded  in  a  Taylor  series  with  remainder  (up  to  a  specified  number  of 
terms)  at  a  given  point.  To  bound  the  remainder  term,  the  required  deriva¬ 
tive  is  evaluated  over  a  whole  rectangle  (using  interval  arithmetic)  which 
is  guaranteed  to  contain  the  point  at  which  the  derivative  should  be 
evaluated.  Iterative  procedures  can  be  specified  which  limit  the  growth 
of  the  width  of  the  resulting  interval. 

Since  this  method  depends  on  the  ability  of  the  computer  to  evaluate 
higher-order  derivatives  of  f,  it  is  handy  to  have  a  computer  program  which 
can  do  analytic  differentiation.  Such  computer  programs  have  indeed  been 
written,  either  tailored  for  the  purpose  at  hand  [9],  or  in  more  general 
settings,  such  as  the  FORMAC  capability  for  the  FORTRAN  IV  compiler  on  the 
IBM  7094. 

The  success  of  interval  arithmetic  in  this  setting  is  somewhat  diffi¬ 
cult  to  evaluate.  The  problem  is  that  for  reasonably  complex  systems  of 
equations  and  for  long  ranges  of  integration  with  respect  to  the  independent 
variable,  the  resulting  interval  tends  to  be  too  wide  to  be  of  much  practical 
value.  Attempts  at  elaborate  transformations  to  reduce  the  error  growth 
due  to  the  remainder  term  evaluation  being  too  crude  have  in  general  been 
defeated  by  the  fact  that  the  structure  of  interval  arithmetic  (lack  of 
additive  inverses)  causes  growth  of  widths  of  intervals  due  to  too  many 
operations.  Also,  on  some  computers  (such  as  the  CDC  1604)  the  floating- 
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point  hardware  structure  of  the  computer  is  so  unfriendly  that  interval 
arithmetic  operations  are  rather  time-consuming.  For  short  integrations, 
and  for  qualitative  estimates,  interval  arithmetic  may  be  very  valuable. 

ROOTS  OF  POLYNOMIALS.  Moore  suggested  that  a  simple  procedure  for 
localizing  zeroes  of  rational  functions  can  be  developed  using  interval 
arithmetic.  Such  a  procedure  was  indeed  programmed  [3].  The  method  is 
based  on  the  simple  fact  that  if  P  is  given  rational  form  in  n  variables, 

R  a  rectangle  in  En,  and  P(R)  evaluated  in  interval  arithmetic  does  not 
contain  the  point  0,  then  P  (as  a  function  of  real  variables)  cannot 
possibly  have  any  zeroes  in  R. 

An  iterative  procedure  can  be  implemented  based  on  the  fact  that  if 

R^  anu  R2  are  two  rectangles  in  En  each  of  which  contains  a  given  zero  of 

P,  then  their  intersection  must  necessarily  also  contain  that  zero.  Thus, 
an  extension  of  Newton's  method  is  possible,  as  long  as  care  is  taken  at 
each  iteration  to  intersect  the  new  interval  (which  may  not  be  contained 
in  the  one  obtained  at  the  previous  iteration)  with  the  old  one,  thus 
guarding  against  divergence.  This  is  called  by  Moore  "the  method  of 
interval  contractions".  Clearly  any  such  procedure  must  converge,  but 
the  limit  will  in  general  be  an  interval,  rather  than  a  point.  If  the 

limit  interval  is  too  wide,  the  process  may  be  repeated  by  subdividing 

the  original  rectangle  R  into  smaller  ones. 

Similar  results  were  obtained  for  the  complex  domain  (Boche  [2] 
having  extended  the  concept  of  interval  arithmetic  to  the  complex  plane) 
by  Hansen  [6]  and  Bennett  [1]  . 

For  this  problem,  interval  arithmetic  may  well  be  the  best  (compu¬ 
tationally  speaking)  method  of  obtaining  results,  especially  if  it  is 
desirable  to  find  regions  guaranteed  not  to  contain  any  zeroes  of  some 
given  function. 

MATRIX  INVERSION  AND  THE  EIGENVALUE  PROBLEM.  The  problem  of  inverting 
matrices  in  the  context  of  interval  arithmetic  comes  from  two  distinct 
sources.  Problem  one:  given  a  matrix  with  real  elements,  obtain  a  (real) 
inverse  with  automatic  error  bounding  of  round-off.  Problem  two:  given 
a  method  of  obtaining  solutions  of  some  problem  in  ordinary  arithmetic 
(for  example,  Newton's  method  in  n  variables)  which  calls  for  inverting 
matrices,  extend  this  method  to  the  case  where  interval  arithmetic  will 
be  used  for  the  solution  (possibly  because  the  coefficients  are  only 
approximately  known).  That  is,  in  problem  two  we  are  asked  to  invert  a 
matrix  with  interval  elements. 

Since  it  is  not  a  priori  clear  what  we  mean  by  an  "inverse"  of  an 
interval-valued  matrix,  we  define  this  inverse  to  be  the  set  of  inverses 
of  all  of  the  real  matrices  contained  in  the  given  interval  matrix.  It  is 
understood  that  the  inverse  is  defined  only  if  the  interval  matrix  does  not 
contain  any  singular  real  matrices. 
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Hansen  (  [4]  and[5])  has  worked  extensively  on  this  problem.  He  shows 
that  a  direct  extension  of  the  standard  methods  for  matrix  inversion  (such 
as  modifications  of  Gauss  -  Seidel)  to  interval  arithmetic  is  not  very 
useful,  because  of  the  many  arithmetic  operations  involved,  and  (again) 
because  of  the  lack  of  additive  inverses.  Instead,  he  develops  several 
methods,  all  based  on  essentially  the  same  principle.  What  he  does  is  to 
compute  an  (approximate)  real  inverse  of  the  real  center  of  the  interval 
matrix,  and  then  (using  some  iterative  procedure)  compute  in  interval 
arithmetic  bounds  for  the  width  of  each  element  of  the  true  inverse  of  the 
interval  matrix.  The  variations  in  the  iterative  procedures  consist  of 
trying  to  represent  things  in  such  a  way  as  to  have  as  many  terms  as  possible 
be  non- interval. 

Similar  considerations  apply  to  the  problem  of  finding  eigenvalues 
and  associated  eigenvectors  of  real-valued  or  interval-valued  matrices. 

Again,  direct  extensions  of  the  standard  techniques  used  for  real  arith¬ 
metic  are  not  satisfactory.  Hansen  [6]  suggests  iterative  procedures 
using  interval  arithmetic  once  approximate  solutions  are  obtained  using 
real  arithmetic. 

The  numerical  results  quoted  by  Hansen  suggest  that  very  good  accuracy 
can  be  obtained  using  interval  arithmetic.  His  methods  do  converge,  although 
he  does  not  discuss  the  rate  of  convergence.  Note  that  in  Hansen's  methods 
it  frequently  pays  to  carry  out  the  real  computations  involved  using  ex¬ 
tended-precision  arithmetic,  since  in  general  multiple-precision  arithmetic 
is  much  faster  than  the  interval  arithmetic  procedures  required,  and  it  is 
worthwhile  to  go  to  great  lengths  to  save  an  iterative  step. 

SYSTEMS  OF  SIMULTANEOUS  EQUATIONS.  Let  f  (x)  denote  the  set  of  n 
rational  forms  f^(x)  in  the  n  formal  variables  x  ,  and  let  it  be  desirable 

^  n 

to  find  a  solution  to  f(x)  =  0  in  the  vicinity  of  some  point  x^  in  E  .  A 
method  proposed  by  Moore  goes  as  follows. 

Let  y  be  a  solution  near  Xgj  i.e.  let  f(y)  =  0.  (Of  course,  we  do  not 

know  y  explicitly.)  Expanding  f(x)  as  a  Taylor  series  with  remainder  about 
y,  we  have  f(Xg)  *  f(y)  +  (Xq  “  y)J(z),  where  z  is  some  point  "between" 

Xq  and  y,  and  J  is  the  Jacobian  matrix  evaluated  at  z.  Expressing  z  as 

y  +  0(Xq  -  y) ,  where  0  is  a  vector  with  elements  between  0  and  1,  it  can 

be  seen  that  if  R  is  a  rectangle  which  contains  both  Xq  and  y,  then  R 

also  contains  z.  Hence,  we  can  try  to  solve  (Xg-y)J(R)  =  f(Xp)  for  y. 

This  will  yield  a  new  rectangle  R'  which  contains  y,  and  which  can  then  be 
intersected  with  R  to  yield  a  (hopefully)  smaller  rectangle  R".  We  now 
solve  (R"-y) J(R")  for  y,  etc;  this  will  eventually  converge  to  (we  hope)  a 
small  interval  containing  the  real  solution  y. 

Hansen  gives  a  slight  improvement  in  the  method  [6];  this  is  essentially 
a  slightly  better  way  of  writing  things  down  for  computation. 
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It  can  be  seen  that  this  is  a  variant  of  Newton's  method,  adapted 
for  interval  arithmetic.  It  requires  that  f  contain  no  other  zeroes  near 
the  point  in  question,  for  otherwise  the  Jacobian  J  becomes  singular.  Again 
it  pays  to  obtain  as  precise  an  initial  guess  as  possible,  using  ordinary 
(possibly  extended-precision)  arithmetic. 

The  author  knows  of  no  numerical  experimentation  with  solving  large 
systems  of  equations  using  interval  arithmetic. 


THE  TWO-POINT  BOUNDARY-VALUE  PROBLEM.  This  problem  is  currently 
under  investigation  by  Hansen.  He  has  devised  a  general  method  for 
tackling  the  solution  of 


y 


(n) 


f (x,y,. . .y 


(n-1) 
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with  a  total  of  n  conditions  prescribed  at  the  end  points  x  =  0  and  x=  1. 
His  method,  based  on  an  adaptation  of  a  finite-difference  method,  gives 
sharp  bounds  at  the  mesh  points  and  less  sharp  bounds  throughout  the 
interval.  It  will  be  described  in  a  forthcoming  paper. 


OTHER  POSSIBLE  APPLICATIONS.  Interval  arithmetic  may  have  potentially 
many  uses.  It  has  been  suggested  that  it  can  be  used  in  control  theory, 
where  it  is  desirable  to  let  parameters  in  differential  equations  range 
over  certain  restricted  domains.  Another  potential  area  of  utility  is 
in  design,  where  it  can  be  used  in  conjunction  with  an  on-line  computer 
system.  A  designer,  seated  in  front  of  a  terminal  in  communication  with 
a  computer,  can  experiment  with  various  possible  designs  by  letting  some 
variables  range  over  a  set  of  interval  values.  With  instant  feedback  from 
the  computer,  the  designer  can  begin  to  get  a  feel  for  the  effects  of 
perturbations  in  the  design  parameters.  Using  interval  arithmetic  in  this 
setting  is  particularly  attractive  because  sharp  bounds  are  not  required  - 
the  qualitative  estimates  would  be  produced  in  relatively  little  time,  and 
would  at  the  same  time  be  completely  rigorous,  covering  all  possible  cases. 

THE  REPRESENTATION  PROBLEM.  The  major  trouble  with  interval  arith¬ 
metic  is  that  due  to  the  lack  of  inverses  forms  normally  considered 
algebraically  equivalent  are  computationally  quite  different.  It  is  always 
advisable  when  using  interval  arithmetic  to  eliminate  entirely  expressions 
of  the  form  x  -  x.  Other  reductions  of  this  type  suggest  themselves. 

The  general  problem  can  be  stated  as  follows.  Suppose  that  f  is  a 
given  function  (from  En  into  the  reals)  and  suppose  that  it  is  desired  to 
obtain  bounds  on  the  range  of  values  of  f  over  some  rectangle  R  using 
interval  arithmetic.  What  is  the  "best"  way  of  representing  f  from  the 
point  of  view  of  obtaining  the  narrowest  bound? 

There  are  three  different  approaches  to  this  problem.  One  can  try 
to  obtain  an  optimal  representation  for  f.  (The  author  strongly  suspects 
that  this  approach  is  not  in  general  workable;  that  is,  given  a  general 
function  f,  there  is  no  algorithmic  procedure  that  would  allow  the  selection 
of  a  "best"  form.)  A  second  approach  can  be  based  on  the  following:  if 


and  f 2  are  two  different  representations  for  f,  and  f^(R)  =  1^,  f2(R)^l2» 

then  I^A  1^  also  contains  the  range  of  f  over  R.  It  may  be  possible  by 

judiciously  choosing  among  different  representations  for  f  to  obtain 
successively  better  approximations  to  the  range  of  f.  Although  there  is 
probably  no  algorithmic  procedure  guaranteed  to  converge  for  an  arbitrary 
function  f,  it  may  be  possible  to  find  some  programmable  heuristics  which 
greatly  reduce  growth  of  interval  widths.  The  third  approach  consists  of 
subdividing  the  original  rectangle  R  into  smaller  rectangles  and  performing 
the  required  evaluations  on  each  of  the  small  pieces.  This  process  will 
generally  result  in  narrower  bounds,  and  is  in  fact  guaranteed  to  converge 
to  the  exact  range  of  f  regardless  of  the  representation  chosen.  The 
convergence  is  however  so  slow  compared  to  the  overhead  for  repeating  the 
computations  for  each  one  of  the  smaller  intervals  that  this  approach  is 
not  very  practical. 

Moore  has  noticed  that  a  certain  representation,  which  he  calls  the 
centered  form,  will  frequently  yield  good  results.  Briefly,  this  scheme 
goes  as  follows:  Given  a  formal  function  f  of  (say)  one  variable  x,  and 
assuming  that  we  are  interested  in  evaluating  f  over  the  interval  [a,b]= 

[m  -  4(b-a),  m  +  4  (b-a) ]  ,  we  represent  f  as  expanded  about  the  midpoint 
m.  That  is,  we  obtain  a  form  g  by  the  relation  g(x-m)  =  f(x)  -  f(m),  so 
that  g(y)  *  f(y  4m)  -  f(m).  g  has  to  be  represented  in  the  most  "economical" 
way  possible,  so  that  the  number  of  occurences  of  the  term  y  cannot  further 
be  reduced.  Since  f([a,b])  =  g([-4(b-a),  4(b-a)])  we  have  moved  the  required 
interval  evaluation  to  be  centered  about  zero. 

2 

For  an  example,  let  f(x)  =  x  -  x  ,  and  let  the  interval  in  question 
be  [0,1]  .  The  actual  range  of  values  of  f  is  of  course  [0,4]  .  Evaluation 
of  f  as  written  yields  [0,1]  -  [0,1]  *  [0,1]  *  [0,1]-[0,1]=  [-1,1]  . 

Writing  f  in  "nested"  form  as  x*(l-x)  yields  [0,1]  *  [0,1]  =  [0,1] ;  an 
improvement,  but  still  not  very  good.  Writing  f  in  centered  form,  we  have 
(with  y  =  x-4)  g(y)  =  -y2  4-  4,  so  that  f(x)  is  represented  as  -(x~4)^  4-  4; 
interval  evaluation  of  this  form  yields  -  [-4*4]  *  [-4,4]  4-  4  =  [0,4]  . 

This  turns  out  the  best  that  can  be  done  for  any  given  representation  with 
the  evaluation  of  only  one  interval.  If  however  we  are  willing  to  evaluate 
separately  the  range  of  f  on  [0,4]  and  also  on  [4,1],  then  by  using  the 
centered  form  it  turns  out  that  we  can  bound  the  range  of  f  by  [0,3/8].  In 
fact,  if  we  keep  halving  the  width  of  the  (equal)  intervals,  it  can  be 
shown  that  interval  evaluations  approach  the  upper  bound  4  linearly  with 
the  width. 

Lest  the  reader  conclude  that  the  centered  form  is  always  the  best 
representation,  consider  the  function  f(x)  *  x  4-  x^,  and  let  the  interval 
in  question  be  (2-s,24-s  ]  where  0  <  s  <  2.  Then  both  straightforward  interval 
evaluation  and  the  nested  form  give  [s2-5s4-6,s24-5s4-6] ,  which  is  the  exact 
range  of  values.  In  centered  form,  however,  we  represent  f  as  (x-2)  (x4-3)4-6; 
evaluation  of  this  yields  [-s2-5s,s^4-5s]  4-6,  which  exceeds  the  actual  width 
by  2s2. 

It  is  possible  (and  desirable)  to  modify  the  rules  of  interval  arith¬ 
metic  in  order  to  reduce  spurious  growth  of  intervals.  One  obvious  and 
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easily  programmable  change  is  to  define,  for  all  intervals  I, 

In  =  {xn  :  x  e  1} 

This  in  general  yields  smaller  intervals  than  the  computation  of  I^*l2*‘«**In 

for  I  =  I„=  . . .  =  I  =1.  Other  modifications  of  this  sort,  which  take 
±  L  n 

into  account  known  and  easily  computable  exact  ranges  of  values  of  a  set  of 
elementary  common  forms,  may  improve  the  performance  (and  possibly  even  speed 
up  the  operation  of  the  system,  as  generally  fewer  multiplications  will  have 
to  be  performed  during  the  computations) . 

Note  that  with  changes  of  this  sort,  some  of. the  properties  of  interval 
operations  no  longer  hold.  For  example,  with  the  change  indicated  above 
for  raising  to  powers,  subdistributivity  no  longer  holds  in  its  original 
form;  the  interval  I*(I+1)  need  no  longer  be  contained  in  the  interval 
2 

I  +  I  (whether  it  is  or  not  depends  on  I).  If  I  =  [-1,1],  then 

I*(I+1)  =  [-1,1]  *  [0,2]  =  [-2,2]; 


while 


I2  +  I  +  [0,1]  +  [-1,1]  =  [-1,2]. 

This  tends  to  complicate  the  representation  problem  even  further,  since 
it  becomes  desirable  to  have  a  representation  contain  as  many  (in  some  sense) 
as  possible  of  the  forms  whose  ranges  of  values  are  exactly  computable. 

The  changes  are  all  for  the  better,  however;  the  complications  result  because 
we  now  have  better  ways  of  representing  functions  than  formerly. 

SYSTEMS  PROGRAMMING  FOR  INTERVAL  ARITHMETIC.  Programming  for  interval 
arithmetic  is  somewhat  similar  to  writing  (general  real)  computational 
routines  in  the  early  days  of  computing,  before  the  hardware  implementation 
of  floating-point  arithmetic.  At  level  1,  the  systems  programmer  has  to 
build  the  basic  tools  for  performing  interval  computations:  an  adder,  a 
multiplier,  an  inverter  for  producing  an  incerval  (l/d,l/c)  given  the 
interval  (c,d),  and  (if  exponentiation  is  desired)  functions  that  compute 
good  bounds  on  the  range  of  values  of  the  EXP  and  LOG  operators.  (Similarly, 
other  elementary  transcendental  functions  such  as  SIN  should  be  incorporated.) 

At  level  2,  tools  must  be  provided  for  convenient  interfacing  with  the 
user.  For  a  simple  example:  subtraction  can  obviously  be  implemented  very 
simply  using  the  adder  of  level  1;  at  the  same  time,  it  is  clearly  not 
desirable  to  have  the  user  perform  this  implementation  every  time  he  wishes 
to  execute  subtraction.  Thus,  a  set  of  subroutines  must  be  provided  for 
the  user  which  he  can  conveniently  call.  There  are  likely  to  be  a  large 
number  of  such  subroutines,  for  the  following  reason.  It  is  generally 
desirable  to  allow  the  user  to  mix  the  mode  of  the  variables  freely;  he 
should  be  allowed  to  add  a  integer-valued  variable  or  constant  to  an 
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integer-valued  one.  By  the  time  all  possible  combinations  of  modes  for 
operands  are  accounted  for,  the  number  of  different  subroutines  provided 
is  staggering.  (Actually,  there  are  typically  about  eight  different 
routines,  each  of  which  has  many  entry  points.) 

It  is  clear  that  any  such  package  of  subroutines  should  be  FORTRAN 
compatible.  While  the  level  1  subroutines  usually  have  to  be  written  in 
machine  language,  there  is  usually  no  reason  why  the  level  2  routines 
themselves  cannot  be  written  in  the  FORTRAN  language. 

The  representation  of  interval  numbers  within  a  computer  for  FORTRAN 
might  have  been  quite  awkward  were  it  not  for  the  fact  that  formally  an 
interval  number  looks  just  like  a  complex  number.  Any  FORTRAN  language 
compiler  equipped  to  handle  complex  numbers  can  be  tricked  into  handling 
interval  numbers  by  the  appropriate  TYPE  declarations.  This  is  very  handy 
for  getting  interval  numbers  in  a  decent  format  into  and  out  of  the  computer, 
and  also  for  defining  interval-valued  constants.  (Arrays  of  interval 
numbers  are  also  easier  to  handle  if  they  are  defined  as  being  of  TYPE 
COMPLEX.) 

The  arithmetic  operations  have  to  be  performed  by  calls  to  the 
appropriate  routines.  Some  computers  (for  example,  the  CDC  1604  and  3600) 
have  a  feature  in  their  FORTRAN  compilers  which  allow  the  definition  of 
other  (non-standard)  variable  types.  What  this  means  is  that  the  compiler, 
when  it  encounters  a  variable  of  non-standard  type,  generates  a  call 
automatically  to  the  appropriate  arithmetic  routine.  This  simplifies 
usage  of  interval  arithmetic  greatly,  since  the  user,  once  he  defines  a 
variable  as  being  of  TYPE  INTERVAL,  can  use  it  in  statements  as  if  it  were 
any  other  type  (integer  or  real).  In  fact,  should  this  prove  desirable, 
it  is  possible  to  define  variables  as  being  of  type  "double-precision 
interval"  (the  appropriate  routines  would  have  to  be  provided) .  For  an 
example  of  an  interval-arithmetic  package  of  the  sort  just  described, 
see  [8]. 

The  level  2  routines  will  depend  to  some  extent  on  the  exact  working 
of  the  FORTRAN  compiler.  The  level  1  routines  are  essentially  compiler- 
independent  ;  they  are  however  heavily  dependent  on  the  way  the  given  computer 
performs  floating-point  operations.  (For  convenience  of  interfacing  with 
FORTRAN,  the  interval  endpoints  should  usually  be  represented  as  floating¬ 
point  numbers.)  The  (real)  operations  have  to  be  performed  at  each  end 
point  in  roughly  the  sequence:  1)  perform  the  operation  in  a  double  length 
accumulator  by  using  both  the  A  and  the  Q  registers  without  rounding;  2) 
normalize  the  result;  3)  round  to  a  single-precision  floating-point  number 
by  adding  (or  subtracting)  a  1  in  the  last  place,  unless  the  result  was 
exact.  If  the  computer  does  not  allow  this  sequence  of  operations  to  be' 
performed  using  the  hardware  floating-point  instructions,  then  these  opera¬ 
tions  have  to  be  simulated  by  software,  using  fixed-point  instructions. 

Similar  considerations  apply  to  the  computation  of  the  transcendental 
functions.  The  functions  should  be  computed  in  such  a  way  that  the  result 
is  off  by  at  most  one  in  the  least  significant  bit  of  the  single-precision 
answer. 
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Exponentiation  can  be  implemented  using  the  LOG  and  EXP  routines.  The 
system  should  however  first  determine  if  the  exponent  is  an  integer  (even 
if  represented  as  a  floating-point  number).  As  indicated,  a  substantial 
reduction  in  the  growth  of  the  widths  of  intervals  can  be  effected  if  integer 
exponentiation  is  computed  by  repeated  multiplications,  using  the  true-range- 
of-values  for  raising  to  powers. 
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HOMEOSTATIC  ORGANIZATIONS  FOR  ADAPTIVE 
PARALLEL  PROCESSING  SYSTEMS 


Robert  M.  Dunn 

U.  S.  Army  Electronics  Command 
Fort  Monmouth,  New  Jersey 

An  effective  Army  is  not  possible  without  the  effective  performance 
of  tactical  communications  and  information  processing  functions.  An 
intriguing  possible  realization  for  the  future  is  one  which  considers  an 
integrated  system  providing  service  for  both  the  communications  and 
information  processing  functions.  Within  the  realm  of  such  a  possibility, 
one  may  visualize  a  utility-like  availability  to  these  services  for  any 
qualified  and  authorized  user. 

At  least  three  distinct  approaches  to  such  a  military  system  are 
apparent.  The  first  approach  would  provide  each  tactical  element  with 
a  separate  facility  for  the  integrated  services.  The  second  approach 
would  be  to  have  many  tactical  elements  time  share  a  central  facility. 

And  the  third  approach  would  be  to  provide  an  Army-wide,  common-user 
network  for  the  integrated  services.  This  network  would  be  designed  to 
tolerate  losses  of  parts  of  itself  without  serious  degradation  of  service 
from  the  remaining  balance. 

From  a  technological  point  of  view,  the  separate  facility  approach  is 
clearly  the  most  near  term  and  expedient.  However,  in  the  long  term,  this 
approach  suffers  from  two  weaknesses.  First,  either  each  facility  is  tailored 
to  each  tactical  element  or  a  single  type  of  overly  general,  excessively 
capable  facility  is  designed  for  all  needs.  Neither  alternative  is  very 
desirable.  The  second  weakness  is  that  the  set  of  separate  facilities  must 
be  embedded  in  a  superfacility  to  provide  the  basis  for  interchange  of  infor¬ 
mation  between  functionally  distinct,  but  organizationally  unified  tactical 
elements. 

The  centralized,  time -sharing  approach  implies  minimal  equipment 
costs  and  simplified  logistics.  This  approach  also  provides  ample  oppor¬ 
tunity  for  the  just  cited  information  interchange.  However,  this  centralized 
approach  guarantees  chaos,  not  to  mention  severe  losses  and  possible  defeat, 
in  the  event  of  the  destruction  of  such  a  facility.  The  mere  hint  of  its 
existence  would  assure  that  such  a  facility  became  a  prime  target. 

The  merits  and  demerits  of  the  network  approach  are  not  as  readily 
compared  and  balanced  against  each  other.  Technologically,  the  network 
approach  is  the  least  certain.  Economically,  it  is  possibly  as  expensive 
or  more  expensive  than  the  most  costly  already  considered.  Technological 
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turns  will  establish  the  degree  of  the  logistics  problems  it  presents,  ctnd 
so  on.  But,  all  of  the  real  or  anticipated  uncertainties  or  drawbacks  are 
potentially  balanced  or  surpassed  by  the  potential  advantages  of  this 
approach.  It  could  increase  operational  flexibility.  It  could  enhance 
tactical  survivability.  The  quality  of  service  would  be  greatly  improved. 
Such  an  approach  could  even  foster  a  design  that  permits  dynamic  system 
growth  and/or  adaptation  to  changing  requirements  and/or  applications 
and/or  environments. 

However,  a  great  deal  of  knowledge  is  not  available  on  network 
processor  systems.  This  scarcity  is  the  cause  of  our  uncertainty  about 
the  network  approach  to  the  integrated  system.  Therefore,  the  objective 
of  this  discussion  is  to  enlarge  our  generalized  understanding  of  a  network 
which  is  primarily  composed  of  digital  processors,  information  storage 
sub-systems,  and  other  special  or  limited  purpose  sub-systems.  For 
example,  analog  processor,  hybid  processors,  communications  equip¬ 
ments,  weapons  systems,  etc.  This  integrated  tactical  utility  is  considered 
to  be  geographically,  dispersed  and  offers  the  following  features: 

-  Each  subscriber  approaches  the  system,  uniformly,  as  a 
common-user,  whether  it  be  for  communications  or  information 
processing  services. 

-  Automatic  control  of  the  system  is  operationally  distributed 
across  the  nodes  of  the  network. 

-  The  system  automatically  determines  which  aspects  of 
itself  are  necessary  to  satisfy  each  user's  service  request  by 
analyzing  each  service  request.  The  system  then  automatically 
allocates  and  interconnects  the  necessary  resources  if,  and 
from  wherever,  they  are  available  within  the  network. 

-  Multiple  users  may  simultaneously  access  the  system 
without  incurring  mutual  interferrence  to  the  limit  of  the 
systems'  capacity. 

-  Lastly,  arbitrary  subsets  of  the  users  of  the  system  may 
cooperate  via  the  system,  using  it  as  their  means  of  inter¬ 
connection  and  basis  for  cooperation. 

The  most  important  implication  of  these  features  is  the  set  of  items 
that  must  be  considered  as  separately  allocatable  resources  within  the 
system.  Such  usual  things  as  computer  programs,  storage  capacity, 
information,  communications,  sensors,  and  processors  are  within  this 
set.  But,  atypically,  this  set  includes  the  control  function  or  even  other 
users.' 
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Our  formulation  of  the  system  relies  on  two  assumptions.  The  first 
is  that  for  any  system  of  the  type  under  consideration,  there  exists  a 
positive  integer  N  such  that  the  system  is  said  to  be  an  Nth  level  organiza¬ 
tion.  This  implies  that  there  are  N  hierarchical  levels  of  structure 
where  lower  level  functional  elements  are  combined  to  form  higher  level 
functional  elements.  These  combinations  may  either  be  permanent  or 
temporary  for  some  transient  functional  purpose. 

The  second  assumption  is  that  every  output  of  every  functional  element 
is  an  input  to  some  other  functional  element.  Therefore,  the  inputs  and 
the  outputs  are  members  of  the  same  set  of  symbols. 

In  addition  to  these  assumptions,  there  are  a  number  of  constraints 
upon  the  formulation  of  the  system. 

First,  the  behavior  of  the  system  must  be  able  to  be  characterized 
such  that  all  functional  elements  of  a  given  type  have  identical  physical 
realizations.  The  interests  of  economics  and  logistics  are  the  motivation. 

Second,  the  control  function  must  imply  neither  a  centralized  organiza¬ 
tion,  nor  an  omniscient  attitude  towards  the  system's  status  nor  a  large 
amount  of  status  information  or  transmission  thereof.  Otherwise,  the 
survivability  objective  would  be  immediately  obviated.  Next,  the  control 
function  must  allow  for  a  non-deterministic  allocation  of  resources.  When 
resources  are  probablistically  allocated  as  the  result  of  a  search,  the 
degree  of  omniscience  and  the  amount  of  status  information  necessary  to 
the  control  function  may  be  drastically  diminished. 

Another  constraint  upon  the  formulation  is  that  the  notion  of  a  control 
function  must  be  limited  to  explicit  control  only  of  a  node  over  itself.  Each 
node  of  the  system  must  neither  require  direction  from  nor  be  required  to 
give  direction  to  other  nodes  in  the  system.  Implicitly,  nodes  may  affect 
the  behavior  of  each  other  by  generating  undirected  service  requests. 
Enhanced  survival  is  the  principal  motivation  for  vhis  constraint. 

The  final  restruction  is  of  a  slightly  different  type.  System  effective¬ 
ness  requires  that  there  is  a  careful  delineation  of  the  operational  and 
information  environments  along  with  the  actual  functional  sequences  to  be 
performed.  System  efficiency  requires  that  these  delineators  not  be 
overly  specific.  The  implication  here  is  that  such  systems  as  we  are 
discussing  ought  not  to  be  programmed  in  the  usual  sense.  That  is,  the 
development  of  a  step-by-step  sequence  of  directions  is  not  the  role  of  the 
user.  The  user,  instead,  specifies  two  things.  On  the  one  hand  he  declares 
the  name  or  sequence  of  names  of  the  function  or  functions  to  be  performed. 
And  on  the  other  ,  he  denotes  the  environmental  and  data  references 
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germaine  to  these  functions.  The  user  then  accepts  whatever  imple-  • 
mentation  is  open  to  the  system  which  will  both  satisfy  these  specifications 
and  adhere  to  whatever  priorities  or  time  requirements  that  may  be  in 
effect.  This  "functional  programming"  approach  is  feasible  because  of 
the  limited  category  of  functional  classes  to  which  the  military  user  is 
usually  constrained.  Therefore,  although  the  digital  processors  within 
the  system  may  be  capable  of  emulating  a  Universal  Turing  machine, 
their  actual  pragmatic  use  will  be  limited  to  a  well-defined  set  of  inter¬ 
pretations.  For  example,  they  may  be  microprogrammed  in  some  very 
gross  sense. 

Towards  stating  the  model,  consider,  now,  that  an  arbitrary  abstract 
entity  known  as  an  organization  has  two  major  components;  the  structure 
and  the  behavior.  Also  consider  that  control  is  another  abstraction  inter¬ 
woven  into  the  fabric  of  the  organization.  The  purpose  of  control  is  to 
assure  that  the  behavior  is  achieved  within  the  confines  of  the  structure 
according  to  conditions  imposed  by  the  environment  in  which  the  organiza¬ 
tion  exists.  Finally,  consider  that  control,  structure,  and  behavior  are 
further  related  in  that  the  range  of  possible  choices  for  any  one  of  them  is 
severely  constrained  by  the  previously  chosen  ranges  for  the  other  two. 

In  fact,  even  after  determining  these  three  sets  of  possibilities,  it  will 
usually  be  the  case  that  just  a  few  of  the  possible  combinations  will  be 
reasonable  to  consider  according  to  various  criteria. 

If  the  term  "system"  is  now  considered  to  be  the  operational  equiva¬ 
lent  of  "organization"  then  the  set  of  primitive  characteristics  identifies 
the  range  of  possible  structures  as  that  which  also  includes  conventional 
telecommunications  networks.  More  precisely,  the  set  of  structures  are 
those  partially  describable  as  three-dimensional,  coordinate  arrays. 

These  arrays  are  characterized  by  two  properties.  First,  elements  of 
the  network  need  not  exist  at  every  coordinate  intersection  of  the  array. 
Second,  interconnections  only  exist  between  elements  of  the  network 
according  to  some  appropriate  functional,  temporal,  topological  or 
metric  definition  of  "nearness",  i.e.  ,  those  which  are  close  together  in 
some  well-defined  sense. 

In  turn,  the  set  of  possible  behaviors  is  that  which  also  includes  the 
performance  of  arbitrary  communications  and  information  processing 
functions  on  a  time -shared  basis.  A  more  precise  statement  would  be 
that  the  set  of  behaviors  are  those  partially  describable  as  arbitrary 
sequences  of  any  of  transmission/reception,  modulation/demodulation, 
multiplexing/demultiplexing,  switching,  data  manipulation  or  computation 
functions.  For  any  sequence  or  element  of  a  sequence  two  properties  hold. 
First,  the  system  may  not  be  continuously  active  in  the  response  to  that 
sequence  or  one  of  its  elements.  And  second,  for  any  functional  module 
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oi  the  system  and  any  two  consecutive,  even  contiguous,  periods  of 
activity  of  that  module,  the  functional  module  needs  not  be  active  in 
response  to  the  same  sequence,  or  element  thereof,  in  its  successively 
active  periods. 

Finally,  the  set  of  possible  controls  is  that  which  also  includes  the 
ability  of  local  sections  of  the  network  to  be  self-managing.  The 
definition  of  "local"  is  dynamically  determined,  in  time,  according  to 
the  magnitude  of  the  response  required  by  arbitrary  service  request. 
Again  towards  precision,  the  set  of  controls  are  those  partially  describa- 
ble  as  mappings  from  the  cross  product  set  of  the  set  of  stimulators  or 
inputs  with  the  set  of  functional  elements  onto  the  set  of  sub- structures 
of  the  organization.  Each  of  these  sets  of  structure,  behaviors,  and 
controls  is  very  comprehensive. 

Via  these  notions  of  "organization,  "  "structure,  "  "behavior,  "  and 
"control,  "  there  exists  a  precise  context  in  which  to  formulate  the  model 
which  hopefully  will  exhibit  an  ability  to  select  some  optimal  combination 
of  members  of  the  three  sets.  In  so  doing,  the  model  must  allow  for  a 
functionally  modular  system  which  degrades  gracefully  and  which  can 
dynamically  alter  its  own  active  internal  organization.  The  model  must, 
for  the  sake  of  generality,  also  allow  for  a  homeogeneous  system  as 
regards  process,  structure,  and  behavior.  By  this  we  mean  that  the 
abstract  characterizations  of  either  gross  purpose,  gross  structure,  or 
gross  behavior  of  any  functional  element  at  any  Kth  level  of  the  system 
is  isomorphic  to  the  corresponding  abstractions  for  arbitrary  functional 
elements  at  the  same  or  different  levels  of  the  system. 

We  now  make  the  following  definitions: 

Definition  1.  Functional  Element  -  an  instance  of  a  separately  allocatable 
system. 

Definition  2.  Change  Requirement  -  an  input  to  some  functional  element 
of  the  system. 

Definition  3.  Configuration  -  a  set  of  interconnected  functional  elements 
and  a  description  of  that  interconnection. 

Definition  4.  Transformation  Rule  -  an  operator  on  the  set  of  configura¬ 
tions. 

Definition  5.  Response  -  a  change  requirement  generated  by  the  activity 
of  a  functional  element. 
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We  now  let 


S  =  an  n  level  organization; 

,  .th  , 

Qj  =  the  1  type  change  requirement; 

M  J  =  A  functional  element  at  level  K  of  the  system  of  type 

*  k 

A.  .  j  ranges  over  the  set  of  element  types  which  are 
possible  at  level  K; 

2^  =  a  type  of  configuration  where  the  set  of  configurations 

is  given  by  the  set  of  graphs  whose  members  are  models 
for  interconnection  schema  within  the  system; 

r  =  a  transformation  rule  induced  by  an  i**1  type  change 
i  requirement; 

F  =  the  control  function; 

H  =  the  response  function. 

Any  system  may  then  be  defined  as  an  ordered  sextuple. 

S„  =  (A.  C.  G,  D,  F,  H} 

where  A={a  ,a  ,...,a  }.  The  set  of  types  of  change  requirements; 

1  M  P 

C  =  {2^,  2  ^ . 2  q}  the  set  of  types  of  configurations; 

G={r  ,  r  . r  }  the  set  of  types  of  transformation 

a  a  a 

12  p 

rules  corresponding  to  the  set  of  types  of  change  requirements; 
12  n  w 

D  =  {A  ,A  ,  A  ,  . . .  ,  A  }  the  set  of  sets  of  possible 

element  types  at  each  level  of  the  system;  Aw  implies  that 
the  system  may  become  a  more  complex  system  up  to  W  levels 
deep. 


F:  {AxD} 


{GxC}  the  control  function; 


A  . 

H:  (r  ,  M  J) 


(M  _  ,  a  )  the  response  function  where 

r+p  o 


-1  <  P  <  1  and  is  always  an  integer  or  zero. 
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It  is  seen  that  the  control  function  is  a  mapping  from  the  set  of 
order  pairs  of  change  requirement  types  and  element  types  into  the  set 
of  order  pairs  of  transformation  rules  and  configuration  types.  It  is 
also  seen  that  the  response  function  is  the  result  of  the  application  of  a 
change  requirement  induced  transformation  rule  to  a  functional  element; 
more  strictly  speaking  to  a  configuration  of  that  functional  element.  The 
result  is  some  new  functional  element  and  a  generated  change  require¬ 
ment.  Here,  we  can  interpret  |3  =  O  to  mean  that  the  new  functional 
element  is  either  the  same  as  the  old  one  or,  at  most,  a  reconfiguration 
of  the  components  of  the  old  functional  element.  Similarily,  we  may 
interpret  (3  =  1  to  mean  that  the  old  functional  element  has  been  combined 
into  a  more  complex  element.  Finally,  we  may  interpret  (3=  -1  to  mean 
that  the  new  functional  element  is  the  result  of  some  decomposition  of  the 
old  functional  element.  In  addition,  the  following  is  always  true.  If  M 
is  a  functional  element  at  the  K  level  and  is  of  type  Aj  then  we  may 
say  that 

Ak  k-1  i 

M  J  =  S  L  M. 

k  i=oh  1 


This  states  that,  in  general,  each  functional  element  is  an  interconnection 
of  lower  level  functional  elements  arranged  in  one  of  the  possible  configura¬ 
tions.  In  particular,  this  is  true  of  the  system  as  a  whole. 

n-l  i 

S  =  2  M.a 

n  i=0l  1 

Finally,  we  note  that  the  model  is  independent  of  concern  for  actual 
levels  of  system  organization  or  echelons  of  users.  We  also  note  that  the 
nature  of  the  functional  elements  or  their  manners  of  implementation  did 
not  enter  into  the  model.  Therefore,  in  systems  such  as  this  we  expect 
to  be  able  to  dynamically  juggle  arrangements,  relationships,  or  inter¬ 
connections  between  functional  elements  as  diverse  as  trunk  group  frames 
in  a  time  division  multiplex  system,  the  multiplexors  themselves,  operat¬ 
ing  procedures,  or  even  entire  nodes.  At  the  same  time,  the  forms  of 
realization  of  these  functional  elements  may  be  as  varied  as  an  informa¬ 
tion  stream,  a  message,  wired  logic,  stored  logic,  or  even  stored  program. 

In  practical  terms,  a  network  such  as  depicted  in  Figure  1  may  range 
over  many  hundreds  of  square  miles.  The  movement  of  users  and  equip¬ 
ment  within  this  area  appears  to  be  best  served  by  the  class  of  systems 
discussed  herein.  Such  networks  would  necessarily  employ  random 
search  techniques  for  locating  individual  users  within  its  domain. 


As  a  consequence,  the  typical  node  in  such  a  network  may  itself 
be  a  network  of  the  same  class  as  depicted  in  Figure  2.  Again  random 
search  techniques  would  be  utilized  for  routing  traffic  within  the  node. 

In  turn,  possible  depictions  of  typical  processor  and  memory  modules 
appear  in  Figure  3  and  4  respectively.  Further  details  on  such  practical 
considerations  may  be  found  in  the  brief  bibliography  at  the  end  of  the 
paper. 

In  summary,  we  have  been  talking  about  a  network  of  processors 
which  controls  its  own  active  interconnection  scheme,  dynamically 
regulates  the  distribution  of  load  across  itself  in  order  to  achieve  an 
equilibrium  state,  and  does  all  of  this  without  a  central  scheduler  or 
controller! 

Borrowing  from  the  physiologist,  we  shall  label  the  drive  towards 
an  equilibrium  state,  the  "homeostatic"  aspect  of  our  system  and  claim 
that  its  realization  is  a  function  of  the  organization  which  characterizes 
the  system.  The  alteration  of  temporal  and  functional  relationships 
between  nodes  in  the  network  in  response  to  new  functions  or  service 
requests  we  take  as  an  ability  to  alter  behavior  and  so  label  the  system 
"adaptive.  "  The  parallelism  in  the  system  is  readily  apparent.  There¬ 
fore,  in  general,  we  have  been  talking  about  homeostatic  organizations 
for  adaptive  parallel  processing  systems. 
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ABSTRACT .  The  question  to  be  considered  here  concerns  the 

interconnection  of  a  set  of  one-shot  devices  which  are  to  be  activated  in 
one  of  several  predetermined  sequences.  Selection  of  the  first  device 
is  made  externally.  In  addition  to  performing  its  own  action,  each  device 
initiates  a  pulse  which  travels  along  an  explosive  cord  or  MDC  line  to 
activate  the  next  device  in  the  sequence  or  to  break  another  linking  explo¬ 
sive  cord.  The  essence  of  the  problem  is  to  define  a  procedure  for 
interconnecting  all  required  sequences  so  that  one  and  only  one  will 
operate  correctly  when  properly  initiated.  This  is  done  by  setting  up  a 
connection  matrix  representing  all  the  sequences  and  then,  by  various 
operations  on  it,  determining  which  links  are  to  be  broken  and  by  what 
devices.  This  gives  a  solution  but  not  an  optimal  one.  Suggestions  are 
made  for  improving  individual  solutions.  An  example  is  carried  through 
the  entire  discussion  and  a  computer  program  which  mechanizes  the 
procedure  is  exhibited  as  an  appendix. 

INTRODUCTION.  The  question  to  be  considered  here  concerns  the 
interconnection  of  a  set  of  one-shot  devices  which  are  to  be  activated  in 
one  of  several  predetermined  sequences.  Selection  of  the  first  device  is 
made  externally.  In  addition  to  performing  its  own  action,  each  device 
initiates  a  pulse  which  travels  along  an  explosive  cord  or  MDC  line  to 
activate  the  next  device  in  the  sequence  or  to  break  another  linking  explo¬ 
sive  cord.  The  essence  of  the  problem  is  to  define  a  procedure  for  intercon¬ 
necting  all  required  sequences  so  that  one  and  only  one  will  operate 
correctly  when  properly  initiated.  * 

STATEMENT  AND  DISCUSSION  OF  PROBLEM.  Given  a  set  of  n 
devices,  d^,  d^  ....  d^,  it  is  required  to  interconnect  them  by  explosive 
cords  so  that  various  preselected  sequences  of  these  devices  will  be 
actuated.  Explosive  cords  for  all  sequences  must  be  present  at  the  initial 
installation  of  the  devices  and  the  final  choice  of  sequence  is  made  at  the 
time  of  operation  by  selecting  the  starting  point  for  the  required  chain  of 
events. 


'■'Properties  of  MDC  lines,  methods  for  construction  of  the  devices; 
possible  application  and  other  questions  concerned  with  the  physical 
realization  of  the  system  will  be  discussed  elsewhere  [l]  . 
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Example:  Given  devices  a,  b,  c,  d  and  e,  it  is  required 
to  be  able  to  actuate  them  in  any  one  of  the  four  sequences 
abcde,  cbade,  bdac  or  d,  upon  external  command.  The 
first  sequence  will  require  MDC  lines  to  carry  a  pulse 
from  a  to  b,  *  from  b  to  c,  etc;  the  third  will  need  MDC 
lines  from  b  to  d,  from  d  to  a,  and  a  to  c.  For  the  last 
sequence,  only  d  is  to  operate  and  no  other  MDC  lines 
are  needed  (or  permitted). 


Figure  1 

Figures  1A  and  IB  diagram  the  first  and  third  sequences 
of  the  example.  Figure  1C  combines  the  connections  for 
the  two  cases. 


However,  combining  all  the  sequences  does  away  with  the  definition 
of  a  unique  successor  to  each  device  and  special  precautions  must  be 
taken  to  eliminate  unwanted  paths  in  the  explosive  chain.  This  can  be  done 
by  destroying  (negating)  certain  pathways  by  means  of  other  exploding  cords. 

Example;  In  Figure  1C,  if  device  a  is  chosen  as  the  start¬ 
ing  point,  implying **  sequence  abcde,  the  explosive  pulse 
will  travel  to  c  as  well  as  to  b  and  interfere  with  proper 
operation  of  the  system.  To  assure  correct  sequencing, 

MDC  lines  (a,  c)  and  (b,  d)  would  have  to  be  cut.  MDC 
line  (d,  a)  can  remain  intact  since  element  a  has  already 
functioned  by  the  time  element  d  is  activated. 


*In  this  discussion  the  MDC  lines  or  explosive  cords  will  be  treated  as 
being  unidirectional.  The  bidirectional  case  will  be  considered  in  a  later 
section. 

**It  should  be  noted  that  each  sequence  must  start  with  a  different  element. 
For  if  two  started  with  the  same  element,  abed  and  acd,  for  example,  some 
external  action  must  take  place  to  indicate  which  of  the  two  has  been  selected. 
This  external  event  is  then  actually  part  of  the  system  and  should  be 
labeled  as  the  first  device. 
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Example:  Logically,  it  is  relatively  simple  to  cut  (b,  d) 
in  the  first  case  by  having  a  initiate  the  cutting  action 
(by  means  of  another  exploding  cord,  for  example).  It 
is  not  so  simple  to  remove  (a,  c)  because  the  pulse  to  c 
may  get  thru  before  the  pulse  to  destroy  the  connection 
line  does  since  both  emanate  from  a.  To  obviate  this, 
we  make  the 

Assumption:  It  is  possible  to  construct  the  equipment  so  that 
MDC  destroyers  (negators)  act  before  all  other  MDC  lines  emanating 
from  the  same  device. 

Even  though  several  devices  might  be  available  for  breaking  a  MDC 
line,  the  strategy  here  will  be  to  cut  it  as  early  as  possible  in  the  sequence. 
That  is,  activation  of  the  first  device  will  cut  away  all  MDC  lines  which 
will  interfere  with  the  operation  of  its  particular  sequence. 

It  may  happen  that  a  negator  which  is  essential  to  correct  operation 
of  one  sequence  interfers  with  proper  operation  of  another.  The  negator 
must  itself  be  broken  by  another  explosive  cord  (second  negation)  during 
operation  of  the  other  sequence. 

Example;  Figure  2A  shows  the  diagram  of  1C  with  the 
addition  of  the  two  negators  to  permit  sequence  abcde  to 
function  properly.  For  sequence  bdac,  line  (b,  d)  will 
function  properly  (Line  (b,c)  is  to  be  ignored  for  the 
purpose  of  this  particular  explanation)  since  the  negator 
from  a  *  has  not  yet  been  activitated.  However,  line  (a,c) 
will  be  broken  by  the  negator  from  a  before  it  performs 
its  function  since,  by  the  assumption,  the  pulse  travels 
faster  along  a  negator  than  along  a  connecting  line.  That 


A  B 

Figure  2 


*The  negators  from  a  to  break  (a,c)  and  (b,d)  are  shown  as  crosses  on  the 
lines.  The  point  of  origin  of  a  negator  (and,  later,  second  negator)  is  shown 
close  to  the  cutting  point  to  avoid  excessive  lines  in  the  diagram. 


113 


broken  before  it  can  function  in  sequence  bdac.  This 
can  be  done,  as  shown  in  Figure  2B,  by  a  second 
negation  from  either  b  or  d  and  the  first  element  is 
chosen  to  simplify  the  procedure. 


Will  negations  of  an  order  higher  than  the  second  be  required?  That 
is,  will  there  be  occasions  where  it  is  necessary  to  break  a  second 
negation?  The  answer  is  no,  as  the  following  informal  argument  shows. 
Doing  away  with  a  second  negation  implies  that  the  negator  which  it  breaks 
has  become  necessary  in  some  third  sequence  or  that  its  original  purpose 
has  been  interfered  with.  The  latter  case  is  impossible  since  negators 
are  chosen  to  act  from  the  first  element  in  a  sequence  and  the  other  prob¬ 
lem  can  be  bypassed  by  having  a  separate  negator  for  each  different 
sequence  requiring  it.  If  a  different  strategy  were  chosen  for  the  origin 
of  negators  it  is  quite  possible  that  a  higher  order  negator  would  be  needed. 

SOLUTION.  Let  n  be  the  number  of  devices  which  have  been  denoted 
as  dq,  1  <  q  <  n,  and  which  are  to  be  arranged  into  m  <  n  operating 
sequences.  To  simplify  the  notation  we  shall  drop  the  symbol  d  and  use 
the  index  q  as  the  label.  Thus  each  operative  sequence  is  represented 
as  a  sequence  of  integers.  We  now  form  an  m  x  n  sequence  matrix,  S, 
as  follows;  each  of  the  m  sequences  of  integers  will  form  a  row  of  S, 
where  the  order  of  the  rows  is  arbitrary;  if  any  row  has  less  than  n  integers, 
sufficient  0's  are  added  on  the  right  to  bring  the  number  up  to  n. 


Example:  given  six  devices  labelled  1,  2,  ...  6,  with  the 
following  required  sequences;  316524,  4312,  54321, 
654321.  123456,  the  following  5X6  sequence  matrix 
can  be  formed: 


S  = 


3  16  5  2 

4  3  12  0 

5  4  3  2  1 

6  5  4  3  2 

1  2  3  4  5 


4 

0 

0 

1 

6 


\ 

/ 


Associated  with  each  S  matrix  is  an  n  x  n  connection  matrix  C  whose 
entries  c_  t  are  1  if  for  some  sequence  there  is  an  MDC  line  from  device 
r  to  device  t  and  0  otherwise.  These  lines  will  also  be  referred  to  as  major 
connectors  to  differentiate  them  from  first  and  second  negators. 


/ 

ti 
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Example: 


associated  with  S  above 


is  the  6x6  matrix: 


C  = 


1  0  0 
0  1  1 
1  0  1 
0  1  0 
1  0  1 
0  0  0 


Each  sequence  will  be  identified  by  its  first  element.  For  any 
sequence  k  of  length  p,  the  rows  and  columns  of  C  can  be  permuted 
until  the  first  p  rows  and  columns  are  in  the  same  order  as  in  the 
sequence.  Label  this  permuted  matrix  C^.  The  last  n-p  rows  will 
not  be  needed  but  the  last  n-p  columns  will,  although  their  order  is 
irrelevant.  Note  that  the  row  labels  represent  the  required  devices 
and  that  the  column  labels  represent  all  the  existing  devices 


Example: 

3  1  6  5  2  4 

3  1  11 

1  1  1 

6  1 
5  111 

2  11  1 

4  1  1 


5  4  3  2  1  6 

5  111 

4  1  1 

3  1  11 

2  11  1 

1  1  1 


C 


5 


The  following  observations  can  be  made  on  the  C^.  The  first  p-1 
elements  on  the  first  superdiagonal  represent  the  major  connectors 
required  for  the  proper  functioning  of  sequence  k.  There  are  no  l's  on 
the  main  diagonal  since  no  device  is  connected  to  itself. 


For  each  required  device,  the  corresponding  row  contains  a  1  in  a 
column  where  ever  there  is  a  connection  from  it  to  another  device.  The  l's 
below  the  main  diagonal  represent  connections  to  device  already  activated 
and  will  be  of  no  interest  here.  Those  above  the  first  superdiagonal 
represent  connections  to  devices  which  can  still  be  activated  but  which 
must  be  prevented  from  operating  at  this  time. 
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Finally,  if  p  <  n,  there  must  be  no  connection  remaining  from  device  p 
to  any  other  still  active  device.  This  leads  to  the  formulation  of 

Rule  1:  A  sufficient  set  of  negations  for  each  sequence  k 
is  determined  by  all  the  1  * s  above  the  first  superdiagonal 
in  as  well  as  the  p**1  1  on  that  diagonal  if  it  exists.  In 
each  case,  the  source  of  the  negator  is  device  k. 

A  more  formal  proof  that  this  rule  produces  the  desired  negations  will 
be  found  in  Appendix  I. 

It  will  be  convenient  to  record  the  negations  in  the  Ck  by  encircling 
the  l's  identified  in  Rule  1. 

Example; 


316524  431256 


543216  654321 
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1  2  3  4  5  6 


The  five  permutations  of  C  of  the  example  are  shown 
and  the  negators  for  each  sequence  have  been  encircled 
in  their  respective  matrices.  In  C5  line  (1, 6)  requires 
negations  (even  though  on  the  first  superdiagonal)  to 
prevent  6  from  being  activated  when  it  is  not  required 
in  the  sequence.  Similarly,  if  an  MDC  line  had  existed 
from  2  to  5  (and/sr  6),  C4  would  have  shown  the  need 
for  its  negation.  For  each  sequence  k,  the  first  device 
will  be  taken  as  the  source  for  the  negators. 

Once  the  negations  are  determined  for  all  the  sequences,  they  can  be 
combined  and  exhibited  in  the  matrix  C  by  encircling  the  l's  involved  and 
labeling  each  circle  with  the  index  of  the  devices  from  which  the  negators 
must  come.  Call  the  matrix  with  this  extra  labeling  C'. 

Example: 


1 

2 

3 

4 

5 


4  5 


6 


1 


C  ' 
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Line  (1,6)  is  shown  to  need  negations  in  sequences  1,  4, 
and  5  by  the  matrices  C^,  C4  and  C5  respectively.  There¬ 
fore,  since  the  negation  will  come  from  the  first  device  in 
each  sequence  the  device  names  1,  4  and  5  are  appended  as 
shown. 

The  information  on  negators  can  also  be  condensed  into  tabular  form 
as  is  actually  done,  but  in  slightly  different  format,  in  the  computei 
procedure. 


Example: 

MDC  LINE 
(1.2) 
(1.6) 

(2.4) 

(3.1) 

(3.2) 

(3.4) 

(4.5) 

(5.2) 
(5,4) 

(5.6) 


NEGATOR  FROM  DEVICE 

3 

1,4,5 

1 

5,6 

3,4 

3 

4 

5,  6 
3 

5 


C',  which  now  incorporated  information  on  negators  as  well  as  major 
connectors,  can  be  used  to  determine  a  set  of  second  negators.  This 
matrix  can  be  permuted,  as  was  C  to  form  C  with  the  first  p-1  elements 
in  the  first  superdiagonal  indicating  not  only  the  major  connectors  required 
for  sequence  k  but  also  those  negators  capable  of  interfering  with  its 
proper  operation. 

Example: 
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3 


3 

1 

6 

5 

2 


1  6  5  2  4 


4 


1 


Since  only  negators  involving  required  major  connectors 
for  sequence  k  are  of  interest  here,  the  others  are  not 
shown. 


Let  Nw  tj  denote  the  negator  of  major  connector  w  from  device  d.  If 
d  appears  in  sequence  k  after  the  origin  of  w,  the  negator  will  not  inter¬ 
fere  with  the  proper  functioning  of  the  system.  If  d  is  the  origin  of  w  or 
appears  before  it  in  sequence  k,  then  the  negator  will  cut  the  required 
major  connector  before  it  can  operate.  This  leads  to 

Rule  II:  To  find  a  sufficient  set  of  second  negations  for  each 
sequence  k,  consider  C  'fc.  Provide  second  negations  for  all 
those  negators  NW  (j  which  affect  the  first  p-1  elements  on  the 
first  superdiagonal  and  for  which  the  row  labeled  d  does  not 
follow  the  row  in  which  the  negator  in  question  appears.  In 
each  case,  the  origin  of  the  second  negator  is  device  k. 

A  more  formal  proof  that  this  rule  actually  provides  the  necessary  control 
over  the  negations  is  also  contained  in  Appendix  I. 

Example:  Consider  C'^.  w  =  (3,1)  is  negated  by  both  devices 
5  and  6.  Since  columns  labeled  5  and  6  follow  column  3,  no 
second  negation  is  required.  On  the  other  hand  w=(5,2) 
required  second  negations  for  the  negators  from  5  and  6  since 
neither  of  these  two  columns  follows  column  2.  The  second 
negation,  discovered  in  C'3,  comes  from  device  3.  For 
w  =  (1,  6)  a  second  negation  is  required  for  the  negator  from 
1  while  none  is  required  for  those  from  4  and  5. 


The  following  list  is  an  extension  of  the  previous 
one  to  show  second  negations.  They  are  in  parentheses 
behind  the  negators  they  affect. 


MDC  LINES 

NEGATORS  AND  SECOND  N] 

(1.2) 

3(4) 

(1.6) 

1(3).  4,  5 

(2.4) 

1(3) 

(3.1) 

5,6 

(3.2) 

3(5,6),  4(5,6) 

(3,4) 

3(1) 

(4.5) 

4(1) 

(5.2) 

5(3),  6(3) 

(5.4) 

3 

(5.6) 

5(1) 

SIMPLIF IC  AT  ION  S .  It  is  possible  for  redundancies  to  exist  among 
the  negators  and  second  negators.  That  is,  since  negating  devices  have 
been  chosen  as  the  first  in  the  sequence,  it  is  conceivable  that  another 
device  after  the  first  is  already  acting  satisfactorily  as  a  negator.  In 
this  case,  the  number  of  negators  and/or  second  negators  is  reducible. 


Example:  Consider  the  negators  from  devices  5  and  6  for 
line  (3,  1).  In  sequence  6,  device  5  precedes  device  3. 
Therefore,  5  can  provide  the  negation  and  the  one  from  6 
can  be  eliminated.  The  same  idea  justifies  the  removal 
of  two  second  negations  in  (3,2),  the  ones  from  device  6  to 
the  negators  from  3  and  4.  The  connections  to  the  affected 
MDC  lines  now  appear  as; 


(3.1)  5 

(3.2)  3(5)  4(5) 

Another  possibility  for  reducing  the  number  of  negating  lines  is  to 
delay  the  action  until  the  last  possible  moment.  That  is,  if  line  (a,  b)  is 
negated  by  a,  c,  d,  ...  i,  device  a  might  serve  in  all  cases  and  c,  d,  .  .  .  i 
could  be  eliminated  since,  by  the  assumption  on  page  2,  negating  pulses 
always  travel  faster  than  pulses  along  regular  MDC  lines.  The  procedure 
will  not  always  wo  rk  if  the  negator  from  i  has  a  second  negation  on  it. 
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Example:  Consider  the  negators  from  devices  1,  4  and  5 
to  line  (1,6)  and  ignore  for  the  moment,  the  second 
negator  from  3  to  1.  Then  device  1  is  sufficient  to 
negate  (1,6)  in  the  sequences  beginning  with  4  and  5  and 
those  two  negators  would  be  eliminated.  However,  as 
is  actually  the  case,  the  second  negation  from  3  in  both 
sequences  4  and  5  would  reach  the  negating  line  from  1 
before  1  itself  is  activated.  This  would  prevent  proper 
negation  of  (1,6)  in  these  two  sequences. 

Example:  MDC  line  (3,2)  is  now  negated  by  devices  3 
and  4.  In  this  case,  the  negation  from  4  can  be  eliminated 
since  device  5,  which  causes  a  second  negation  of  3  does 
not  occur  (at  all)  in  sequence  4  intime  to  prevent  proper 
negation. 

These  two  simplification  rules  can  be  incorporated  in  the  procedure 
to  reduce  the  number  of  MDC  lines. 


Example;  The  present  example  can  be  simplified  to  provide 
a  smaller  number  of  connections. 


MDC  LINES 

NEGATORS  AND  SECOI 

(1.2) 

3(4) 

(1.6) 

1(3),  4,  5 

(2,4) 

1(3) 

(3,1) 

5 

(3,2) 

3(5) 

(3,4) 

3(1) 

(4,5) 

4(1) 

(5,2) 

5(3) 

(5,4) 

3 

(5,6) 

5(1) 

BIDIRECTIONAL  CASE.  In  the  Bidirectional  case,  a  pulse  may 

travel  in  either  direction  along  an  MDC  line.  Therefore,  it  sequence 

d^  &2.  •  •  •  ^n-1  ^n  *s  constructed  with  bidirectional  lines,  connections  also 

exist  along  the  path  d  d  .  .  .  .  d_  d,.  These  connections  must  be  shown 
6  r  n  n-1  2  1 

in  the  connectiorfmatrix  C  and,  in  practice  this  can  be  accomplished 
quite  simply  by  deriving  C  from  S  as  before  and  forming  a  new  C  equal 


121 


to  C  VJ  C  .  *  Since  the  procedures  in  Rules  I  and  II  involve  only  operations 
on  C  no  further  changes  have  to  be  made  to  solve  the  problem  for  this  case. 

Example:  Using  the  same  sequences  as  in  the  previous 


example , 

we  have 

0 

1 

0 

0 

0 

1 

0 

1 

1 

0 

0 

0 

1 

0 

1 

1 

0 

0 

1 

0 

1 

0 

1 

0 

_  1 

1 

0 

1 

0 

0 

T 

_0 

1 

0 

1 

0 

0 

C  = 

c 

0 

0 

1 

0 

1 

0 

0 

1 

1 

0 

1 

0 

0 

1 

0 

1 

0 

1 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

1 

0 
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The  two  rules  can  now  be  applied  to  this  new  connection 
matrix  to  give  a  list  of  negators  and  second  negators. 

MDC  LINES  NEGATORS  AND  SECOND  NEGATORS 


(1.2) 

3(4,  5,  6) 

(1.3) 

1,  5,  6 

(1.6) 

1(3),  4,  5,  6 

(2.3) 

3(5,6),  4(5,6) 

(2.4) 

1(3),  4(3),  5(3),  6(3) 

(2,5) 

1(3),  4,  5(3),  6(3) 

(3,4) 

3(1) 

(4,5) 

3(1),  4(1) 

(5,6) 

5(1) 

11  rji 

*0^  is  the  transpose  of  C.  CUC  means  the  new  matrix  has  a  1  in  position 
i,j  if  C  and/or  C^*  have  a  1  in  position  i,  j  and  0  elsewhere. 


122 


Several  redundancies  can  be  removed  as  in  the  unidirectional  case. 
More  care  must  be  taken,  however,  since  a  line  indicated  for  example, 
as  (2,4)  now  means  a  connection  from  4  to  2  as  well  as  one  from  2  to  4. 


Example:  The  above  negators  and  second  negators  can 
be  reduced  to  the  following: 


MDC  LINES 

NEGATORS  AND  SECON 

(1.2) 

3(4) 

(1.3) 

1,5 

(1.6) 

3,(5) 

(2.3) 

1(3),  4 

(2,5) 

1(3),  4,  5(3) 

(3,4) 

3(1) 

(4,5) 

3(1),  4(1) 

(5.6) 

5(1) 

VALIDATION  PROCEDURE.  The  above  solution  guarantees  proper 
operation  of  the  sequences  barring,  of  course,  blunders  in  the  application 
of  the  rules.  A  FORTRAN  program,  supposedly  doing  away  with  this 
latter  possibility,  is  used  to  generate  the  first  and  second  negators  and 
its  listing  appears  in  Appendix  II. 

The  introduction  of  simplification  procedures  which  have  been  neither 
formalized  nor  mechanized  raises  the  possibility  of  introducing  logical 
errors  as  well  as  blunders  into  the  solution.  It  is  therefore  advisable  to 
check  that  these  changes  still  produce  the  required  sequences.  This  can 
be  done  by  considering  the  revised  list  of  major  connectors  and  first  and 
second  negators  and  following  the  sequence  of  actions  after  the  required 
initial  devices  are  activated.  The  procedure  is  straightforward:  for  each 
device,  activate  the  second  negators  it  controls,  then  the  still  active  first 
negators  and  then  the  still  active  major  connectors.  If  more  than  one 
major  connector  is  left  from  the  activated  device,  there  is  an  error.  If 
only  one  major  connector  is  left,  the  next  element  in  the  sequence  is 
identified,  and  the  procedure  repeated  for  it.  If  no  connectors  are  left, 
the  sequence  is  ended.  A  FORTRAN  program  (also  appearing  in  Appendix 
II)  has  been  written  to  mechanize  this  procedure  and  print  out  the  valid 
sequences.  Ambiguities  which  result  in  improper  functioning  are  also 
indicated. 
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SUMMARY.  A  procedure  has  been  demonstrated  for  generating  a 
set  of  negators  and  second  negators  which  is  sufficient  for  proper 
functioning  of  the  required  sequences.  It  does  not  produce  an  optimal 
solution  in  the  sense  of  minimizing  the  total  number  of  connections 
although,  in  individual  cases,  redundancies  can  be  eliminated.  Whether 
or  not  an  effective  general  procedure  for  minimizing  the  connections 
(short  of  enumerating  all  possible  combinations  and  selecting  the  smallest) 
exists  is  unknown  at  this  time. 
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APPENDIX  I 


The  following  shows  that  Rule  I  produces  a  sufficient  set  of  negators. 
PROOF; 

We  have  to  show  that  the  negators  given  by  the  rule  prevent  operation 
of  any  device  out  of  sequence. 

Consider  the  required  sequence  k  of  length  p:  k^,  k^,  .  .  .  k  .  has 
been  derived  from  C  by  means  of  the  permutation  P 


pk  = 


(klk2 

.  .  .k  \ 
p  * 

I  n-p+1  . . .  n 

1‘  2 

l  n-p+1  . .  .  n 

1  <  p  <  n 


where  n  is  the  number  of  devices.  There  is  a  connection  from  device 

k.  to  k.  ( 1  <  i,  j  <  n)  if  and  only  if  c  (i, j)  =  1  in  C  . 

1  J  K  K 


The  rule  calls  for  negation  of  all  major  connectors  represented  by 
those  c  (i,j)  for  which  j  >i+l  (1  <  i  <  p  <  n-2)  as  well  as  those  for  which 
i  =  p  and  j  =  i+1.  This  is  to  be  done  by  negators  originating  from  device 
k  =  kj.  Since  negators  act  before  major  connectors  emanating  from  the 
same  device  the  connectors  described  above  are  effectively  non-existent 
for  this  sequence  and  we  may  replace  their  representations  c^(i,j)  by  0, 
forming  the  new  matrix  C''^. 


Now  consider  any  required  device  k^  (l  <  iQ  <  n)  and  remember  that 
it  can  initiate  another  device  j  if  and  only  if  c"^X^o>  j)  =  1* 


If  j  <  i  ,  k.  has  functioned  before  k:  and  the  presence  of  a  major 
o  j  *o 

connector  is  of  no  consequence. 


The  case  j  =  iQ  does  not  occur  since  no  device  is  ever  connected  to 
itself. 


Since  for  all  j>iQ  +1  c"k(i0,j)  has  been  set  to  0  (negated)  by  the  rule, 
none  of  these  k:  can  be  activated  by  k:  . 

We  are  therefore  concerned  only  with  j  =  iQ+l,  the  first  superdiagonal. 

From  the  construction  of  C.  and  C".  ,  c'l(i  ,  i  +1)  =  1  for  1  <  i  <  p-1. 

K  K  O  O  ™  O  “ 


If  p=n,  only  the  first  n-1  major  connectors  of  the  sequence  remain 
and  the  sequence  functions  properly. 

If  p<n,  c'^fi  .  i  +1)  =  0  for  iQ  =  P  since  the  rule  calls  for  negation  in 

this  case.  Therefore,  there  is  no  connection  from  k  to  any  other  device 
and  the  sequence  terminates  as  it  should.  ^ 

The  following  shows  that  Rule  II  provides  second  negators  which 
prevent  unwanted  first  negators  from  interfering  with  the  required  major 
connectors.  It  also  demonstrates  that  the  required  sequence  functions 
properly. 

PROOF; 

For  this  rule,  we  must  show; 

A.  that  no  extraneous  device  functions  out  of  sequence  since  the 
second  negators  might  conceivably  destroy  first  negators  given  by  Rule  I. 

B.  that  the  second  negators  do,  in  fact,  prevent  first  negators  from 
interfering  with  the  required  major  connectors. 

1.  For  any  required  sequence  k,  consider  C which  shows 
all  the  system  negators  found  by  repeated  application  of 
Rule  I. 

2.  All  first  negations  that  are  required  in  sequence  k  are 
initiated  by  device  k  (the  first  element  in  the  sequence), 
and  these  are  all  off  the  first  superdiagonal  of  C'^. 

3.  By  Rule  II,  second  negators  from  device  k  affect  only 

the  first  p-1  elements  which  lie  on  the  first  superdiagonal 

ofC'k- 

4.  When  sequence  k  is  called  for  all  first  and  second  negators 
from  the  initial  device,  k,  function  before  anything  else. 
However,  by  (2)  and  (3)  it  can  be  seen  that  none  of  the 
first  negators  used  in  the  sequence  are  destroyed  by  second 
negators  from  k.  By  a  verbatim  repetition  of  the  proof 
used  for  Rule  I,  it  is  now  seen  that  no  extraneous  device 
functions  out  of  sequence. 

5.  The  first  p-1  elements  on  the  first  superdiagonal  of  C  1 
represent  the  major  connectors  which  must  function  for 
sequence  k  to  operate  properly.  Consider  any  one  of  these 
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elements,  say  c (i^,  i^+1),  which  has  negators  on  it. 

Let  d  be  the  origin  of  one  of  its  negators.  If  d  follows 
in  the  sequence,  the  major  connector  functions 
properly  before  the  negator  is  initiated.  If  d  precedes 
k^  in  the  sequence  or  is  k^  ,  the  second  negator  from 
device  k  (provided  by  Rule  ft)  eliminates  the  first 
negator  before  it  can  destroy  the  required  major 
connector. 

6.  Since  the  quantities  d,  i  and  k  in  the  abo  ^e  were  all 

o 

arbitrary,  the  argument  holds  for  all  the  sequences. 
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APPENDIX  II 


The  Program  for  the  mechanization  of  the  solution  was  written  in 
FORTRAN  II  and  run  on  a  UNIVAC  SS  -  90  card  system.  Several  comments 
concerning  the  procedures  and  conventions  are  necessary. 

1.  The  FORTRAN  listing  can  serve  as  its  own  flow  chart.  It 
contains  several  main,  non-overlapping  parts  (the  prefix  N 
is  used  to  identify  integer  arrays): 

a.  Set  up  sequence  matrix  NS 

b.  Set  up  connection  matrix  NC 

c.  Determine  first  negators 

d.  Determine  second  negators 

2.  It  was  found  convenient  to  include  all  existing  devices  (not  only  the 
required  ones)  in  NS.  The  non  required  ones  follow  the  sequence 
and  each  identifier  is  preceded  by  a  minus  sign.  Thus,  the 
sequence  4  3  1  2  of  our  example  is  actually  entered  as  4  3  1  2  -5  -6. 
The  order  of  the  added  devices  is  unimportant.  This  turned  out 

to  be  a  convenient  way  to  signal  the  routine  that,  even  though  the 
sequence  had  ended,  negators  might  still  be  required. 

3.  Rather  than  rearrange  matrix  NC  to  conform  with  each  permuta¬ 
tion,  subscripted  subscripts  were  used.  That  is,  to  select 
elements  of  NC  for  testing,  we  have  to  look  at  individuals  NC 

(I,  J)  in  a  specified  order.  These  orders  are  given  by  the  rows 
of  NS  which  contain  the  required  sequences.  I  and  J  are  both 
functions  of  the  elements  in  NS;  1=  NS  (q  ,  p)  J  =  NS  (y,  6)  for 
arbitrary  a  ,  P  ,  y,  6.  Therefore,  given  a  ,  p  ,  y,  6,  we  can 
find  NC  (I,  J)  as 

NC(NS  (a  ,  p)  NS(y,6)  ) 

4.  Information  on  each  negator  is  stored  in  an  array  indexed  by  the 
symbol  NG.  For  each  NG,  as  this  array  is  being  formed, 
another  5-position  sector  is  cleared  for  use  in  storing  the  origins 
of  at  most  5  second  negators.  Should  the  number  of  negators 
and  associated  second  negators  exceed  the  arbitrary  numbers 

of  50  and  5,  respectively,  one  "DO"  statement  will  have  to  be 
changed  in  addition  to  the  'Dimension'  statement. 

5.  Throughout  the  program,  several  array  elements  which  are  used 
more  than  once  have  been  renamed  without  subscripts.  This  was 
done  to  speed  up  the  processor  at  the  cost  of,  what  is  hoped  to 
be,  small  decrease  in  readability. 
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FORTRAN  PROGRAM  *  541 

XPLOSIVE  COMPUTER  SCHLACK/E ISMAN  LdSOO 

DIMENSION  NS ( 20 • 20 ) *  NC ( 20 .20 ) « NPEG  (50)  .NEnD(50) *M (50) »NUM(50) • NSEC (50*5) 

COMMON  NS.NCNoEG.NEND.Nl .NUM.NSEC 
SET  UP  SEQUENCE  MATRIX  NS 

ONE  SEQUENCE  Pt-R  CARD*  EVERY  4  SPA  CES.  REMEMBER  LEADING  ZERO  FOR  ONE  DIGIT 
NUMBERS  STARTING  3*  7* 11  *15*19* 

23*27*31 *35. 39. FTC 

REAU  100 . M  *  N  *  I  NO 
MONO. SEQUENCES.  NPNO.  DEVICES 
IND»1  IF  UNIDIREC.  «2  IF  BIDIPEC 
DO  1.  IK1.M 

REAU  100*  ( NS  < 1 • K ) . K»1 . 20 ) 

PRINT  101*  (NS( I *K) *KR1 .201 
1  CONTINUE 

CLEAR  CONNECTION  MATRIX 
DO  19.JR1.N 
DO  19*  141  *  N 
NCI  I* J)BO 
19  CONTINUE 

100  FORMAT  (2014) 

101  FORMAT  (5X.20IS) 

PRINT  HEADINGS 
PRINT  303. 

PRINT  304. 

PRINT  305. 

SET  UP  CONNECTION  MATRIX  NC 
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00  2*  I N 1 1 M 
00  3*  JM1 ,  N1 
J1RJ-H 

IF  <NS(l*Jl>>  2*2*6 

6  NC(NS«i;j)tNS«I.Jl»>l»l 
IF  (1-IN0I  370 1 995 

37  NC(NS<I«Jl>*NS(I*jn»l 

3  CONTINUE 
2  CONTINUE 

determine  first  negations 

NGftl 

00  30*  INI iM 
JSAVENNSI 1*1) 

DEPENDING  ON  COLUMN  N+l  BEING  0 
DO  31 i  JNltNl 

JANJ 

NJNNS(liJ) 

IF  < NJ )  30*996*10 
10  IF  (NS(  I.J-M)  )  9,997.8 
9  JANJ-1 

S  00  32*  KNjAOtN 
NNSNNSI I • K  > 

IF  ( NNS )  4*31*5 

4  NNSN-NN* 

5  IF  (NC(NJ*NNS) )  998*32,7 

7  NBEGtNGINNJ 
NENO(NG)NNNS 
N I ( NG ) N I  SAVE 
NUM(NG)N1 

00  99*  I JKN1 « 5 
NSEC(NG«IJK)NO 
99  CONTINUE 


FOR  EACH  SEQUENCE 
FOR  EACH  OEVICE 

IF  A  DEVICE  FOLLOWS 
INSERT  1  INTO  NC 
UNI  OR  dl  DIRECTIONAL 

ALSO  IN  XPOSE  IF  BIDIRECTIONAL 


INITIALIZE  NEGATOR  COUNTER 
FOR  EACH  SEQUENCE 
FIRST  ELEMENT  OF  SEQ  I 

FOR  EACH  LINE 

CHECK  IF  ELEMENT  NEG  OR  POS* 

IF  OTHER  END  NEG.*  EXTRA  NEGATION 

SET  BEGIN'G  OF  SEARCH  BACK  1 

LOOK  FOR  NEGATIONS  ABOVE  SUPERDI AGONAL 

REVERSE  SIGN  NON-RQRD  ELEM. 
if  i,  Acquires  negation 
beginning  negated  line 

END  NEGATED  LINE 
negateo  by  device  NUMBER 

CLEAR  SECOND  NEGATOR  VECTOR  FOR 
THIS  NEGATOR 


NGNNG+1 
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3 2  CONTINUE 


31  CONTINUE 
30  CONTINUE 

DETERMINE  SECONO  NEGATIONS 

LLMNG-1 

DO  15*  IM1»M 

DO  15*  J#1*N1 

NBP»NS( I. J) 

NEP»NS( I. J+l ) 

DO  13*  L»1*LL 

JF  ( NBP-N9EG ( L ) )  96*12*96 

96  IF  (1-INO)  97t 13*995 

U  IF  (NEP-NEND(L) )  13*14*13 

97  IF  (NBP-NENO(L))  13.98* 13 

98  IF  ( NEP-NBEG ( L )  )  13. 14.13 

14  DO  21*  K«1*J 

IF  (NKU-NSI  I*K)  I  21*22*21 
22  NSEC(LiNUMIL) )»NS«I*l) 
NUM<L)MNUM<L) +1 
GO  TO  13 
21  CONTINUE 
13  CONTINUE 

15  CONTINUE 

DO  500.  I  Ml  *  LL 

PRINT  300*  I  *NBEG(I).NENO(I). 

500  CONTINUE 

300  FORMAT  (5X* I5»3X*2I5*5X* I5*5X«5I5) 

305  FORMAT  (16X.9HFR0M  T0*BX*2HBV*5X 

304  FORMAT  (32X.3HNEGI 

303  FORMAT  (2/» 

STOP 

99d  PAUSE  998 


limit  NON  NEGATORS  SCANNED 
FOR  EACH  SEQUENCE 
FOR  EACH  LINE 


FOR  EACH  NEGATOR 
CHECK  BtGINNING 
UNI-  OR  BI-  DIRECTIONAL 
CHECK  END 
CHECK  CONNECTOR  IN 
OTHER  DIRECTION 
FOR  ALL  ELEMENTS  BEFORE 
DOES  NEGATING  INDEX  APPEAR 
SAVE  SECOND  NEGATOR 


NI(I)*(nSECU*J)»  J»l*5) 


•  14HSEC  QND  NEG'S./) 


997  PAUSE  997 
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996  PAUSE  996 


995  PAUSE  995 


/ 


C  USS  FORTRAN  II  ***  VERSION  9000  22  JAN  63 

C  COMPILED  7/28/67 

C  FORTRAN  PROGRAM  tt  509 

C  4-28-67  SHE  L8500 

C  XPLOSIVE  COMPUTER-  UNIDIRECTIONAL  -  VALIDATION  PROCEDURE 

C  ACCEPTS  MAJOR  CONNECTORFt  NEGATORSANQ  SECOND  NEGATORS  IN 
C  FORMAT  OF  OUTPUT  FPOM  SOLUTION  PROGRAM  <  541) 

C  ALL  MAJOR  CONNECTORS  PRESENT »  WHETHER  NEGATED  OR  NOT* 

C  MUST  BE  INCLUDED 

C  PROGRAM  ACCEPTS  DEVICE  NO.  AND  PRODUCES  EITHER 
C  It  THE  UNIQUE  SEQUENCE  OR 

C  2i  INDICATION  OF  ambiguity 

DIMENSION  NBEG ( 50 )  * NENO ( 50 )  *  N I  <  50 )  t NSEC  (50.5)  t  imS  ( 20 ) 

COMMON  N8EG . NENO .NI. NSEC. NS 

B1W5HT00  L 

B205HONG 

C1W5HAMBIG 

C205HUOUS 

READ  900* NGM 

DO  1*I*1«NGM 

READ  901«NBEG(1) fNEND(I) «NI(  I)* (NSEC(I.J)*JW1*3) 

1  CONTINUE 

101  READ  900 »K 
A105H 
A205H 


I 

* 

1 

# 

X 

i 

% 

? 


A 

w 


. 

I 

. 


i 

I 


135 


7 


2 


NS(1)*K 
DO  2*1*2*20 
NS(I>*0 
CONTINUE 
DO  3»I#1*NGM 
NBEG  ( I )  MASS  ( NBfc.6  ( I ) ) 
Nim*A8S<NI(I)  ) 

3  CONTINUE 

00  20»KK*2*20 

NA#0 

DO  14* 1*1 *NGM 
00  13* J*l*5 

IF  (NSEC(I'J))  910*14*11 

11  JF  (NSEC( I » J) "KJ  13*12*13 

12  IF  (NI(D)  14*910*112 

112  NI(I>"-NI(1) 

GO  TO  14 

13  CONTINUE 

14  CONTINUE 

DO  16»I*1«NGM 

IF  < N Z  ( X  > «*K I  16*13*16 

15  NBEG(I)«-NBEla(l) 

16  CONTINUE 

DO  19*161  *NGli 
IF  (NBEG(I>-K)  19*17*19 
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nzbnend ( I ) 

00  117»L#l»KK-l 

IF  (NZ-NS(L))  117,19*117 

CONTINUE 

00  317.  1101, NGM 

IF  ( NBEG( 1 1 ) +K )  317*217.317 

IF  (NENO(II)-NZ)  317.19,317 

CONTINUE 

IF  (NA)  910,218,118 
IF  (NA-NZ)  24,19,24 
NA0NZ 
CONTINUE 

IF  (NA)  910.30,21 

NS(KK)#NA 

KONA 

CONTINUE 

Al*81 

A20B2 

GO  TO  30 

A10C1 

A20C2 

PRINT  9O2,A1,A2,(NSU),I01,KK) 
GO  TO  101 
FORMAT  (14) 

FORMAT  (3X.8I5) 


902  FORMAT  llOXt 2A5»20I4) 
910  PAUSE  910 
STOP 


ENO 


yvyoyoyyyy 

YYYOYOYVYY 

YYYOYOYYYY 

YYYOYOYYYY 

YYYOYOYYYY 


211224195Y 
4004200954 
7100010400 
1 12225472Y 
122226372Y 


(HEADERS) 

(HEADERS) 

(HEADERS) 

(HEADERS) 

(HEADERS) 
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PROBLEM  SOLVING  BY  DIGITAL-ANALOG  SIMULATION* 

Howard  M.  Bloom 
Computation  and  Analysis  Branch 
Harry  Diamond  Laboratories 
Washington,  D.  C. 

ABSTRACT.  An  evaluation  of  four  simulation  languages,  MIDAS, 
APACHE,  MIMIC,  and  DSL/90,  is  made  to  determine  their  relative 
merits.  The  application  of  analog  computer  techniques  to  digital -analog 
simulation  is  considered.  The  problems  discussed  are  as  follows;  solution 
to  a  set  of  linear  algebraic  equations,  linear  programming,  hybrid  simula¬ 
tion,  partial  differential  equations,  boundary  value  problems,  parameter 
optimization  using  a  least-squares  error  criteria,  and  roots  of  polynomial 
equations.  A  mathematical  outline  of  the  technique  or  problem  is  given 
as  well  as  the  digital  program,  written  in  DSL/90,  which  is  used  to 
represent  the  problem.  Possible  improvements  in  the  simulation  language 
are  shown.  Some  of  the  suggestions  presented  include  the  ability  to 
dimension  variables,  and  a  means  of  using  an  iteration  technique. 


*This  report  will  be  published  in  full  1  January  1968  as  TR-1357  of  the 
Harry  Diamond  Laboratories. 
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A  SHELL  COMPUTER  PROGRAM  WHICH  DETERMINES  THE 

PHYSICAL  PROPERTIES  OF  AN  ARTILLERY  SHELL  AND 
REPRESENTS  ITS  DIMENSIONS  GRAPHICALLY 

Forrest  McMains 

Picatinny  Arsenal,  Dover,  New  Jersey 

The  purpose  of  this  presentation  is  to  describe  a  digital  computer 
program  which  determines  the  physical  properties  of  artillery  shells  and 
related  items. 

I  have  chosen  to  speak  on  this  program  for  two  main  reasons: 

First,  the  program  is  used  daily  at  Picatinny  Arsenal  both  in 
experimental  design  work  and  in  the  analysis  of  end  items.  Since  it  is 
used  primarily  by  people  who  are  not  computer  oriented  extreme  care 
had  to  be  taken  in  writing  the  input-output  operations.  The  input  data  had 
to  be  clear  and  concise.  The  output  information  had  not  only  to  be  com¬ 
plete,  including  as  many  helpful  and  meaningful  results  as  possible,  but 
it  also  had  to  be  kept  brief. 

Secondly,  the  reason  for  choosing  this  program  concerns  the  manner 
in  which  it  is  able  to  handle  large  amounts  of  data  in  an  almost  error  free 
manner.  Special  care  has  been  taken  so  that  every  dimension  of  the  shell 
(the  input  data  to  the  program)  can  be  enumerated  logically  and  quickly. 
The  resulting  graph  (which  is  nothing  more  than  a  picture  of  these 
dimensions)  serves  as  an  excellent  check  on  the  input  values.  A  mere 
glance  at  the  picture  of  the  shell  is  usually  sufficient  to  detect  any  input 
error.  Further,  and  in  most  cases  a  final  check  for  errors,  consists  in 
comparing  this  picture  to  the  original  blueprint  of  the  shell. 


An  artillery  shell  is  formally  defined  as  a  hollow  projectile,  designed 
to  be  given  an  explosive,  a  chemical  or  other  filler  and  fired  from  a 
weapon.  It  is  composed  of  body  pieces  (which  are  frustums  of  right 
circular  cones  and  cylinders);  ogive  pieces  (the  curved,  forward  part  of 
the  projectile,  including  its  pointed  end)  and  fins  (a  fixed  or  adjustable 
airfoil  attached  to  the  projectile  and  parallel  to  the  plane  of  symmetry 
which  affords  directional  stability).  ' 

Each  card  of  input  to  the  program  consists  of  the  four  to  six  dimen¬ 
sions  of  each  piece  plus  an  identification  of  this  piece. 

Figure  1  defines  a  body  piece.  Each  body  piece  has  four  dimensions: 

AB,  the  radius  of  the  end  closest  to  the  reference  axis; 

BB,  the  radius  of  the  opposite  end; 
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HB,  the  length;  and 
RB,  the  reference. 

A  reference  axis  must  be  chosen  before  any  data  is  collected.  Once 
this  axis  is  selected,  every  shell  piece  must  be  referenced  to  it. 

Two  other  parameters  appear  on  the  body  item  card  of  input:  the 
density  of  the  material  used,  and  the  identification  of  the  item. 

Figure  2  defines  a  fin  item.  Its  dimensions  are  analogous  to  those 
comprising  the  body  item;  AF  and  BF  are  radii;  RF  is  the  reference; 
and  HF  is  the  length.  Besides  density  and  an  identification,  a  third 
parameter,  its  thickness  is  also  needed. 

Figure  3  defines  an  ogival  item.  The  parameters  AV  and  BV  are  the 
X  and  Y  coordinates  of  the  origin  of  the  ogival  system  of  the  system  of 
the  (circular)  arc.  RD  is  the  radius  of  the  arc;  RV  and  HV  are  the  refer¬ 
ence  and  length  values. 

The  three  examples  shown  here  illustrate  how  the  arc  is  suspended 
when  AV  and  BV  vary  in  sign. 

As  well  as  these  three  items:  body,  fin  and  ogive,  the  program  will 
also  accept  a  fourth  item;  a  known  piece.  That  is,  a  piece,  or  any  group 
of  pieces,  for  which  the  weight,  moments  of  inertia  and  center  of  gravity 
to  the  reference  is  known.  This  item  will  be  included  in  the  analysis 
with  the  other  (unknown)  pieces. 

Output  to  the  program  is  divided  into  five  parts. 

The  first  part  is  the  graph  of  the  shell.  It  is  a  true  representation  of 
all  the  input  data  and  should  compare  exactly  to  the  blueprint. 

Figure  4  shows  an  example  of  the  graphic  output.  This  particular  shell 
is  composed  of  76  body  pieces  and  4  fins;  a  total  of  80  cards  of  input. 

The  scaling  used  in  this  case  is  l/2  unit  to  the  inch.  Scaling  is  at 
the  discretion  of  the  user.  If  no  scaling  is  specified,  the  best  possible 
scaling  will  be  used;  that  is,  scaling  which  will  produce  a  reasonably 
sized  graph;  height  to  diameter  (X  to  Y  direction)  in  the  ratio  of  1  to  1  and 
the  units  per  inch  in  some  workable  amount  as  1  unit  to  the  inch,  2  units, 
l/2  units,  l/4  units,  etc. 

The  second  part  of  output  consists  in  listing  all  the  input  data,  card 
by  card,  with  a  brief  explanation  of  the  options  requested. 
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The  third  part  gives  the  corresponding  properties  of  each  item 
considered  independently.  These  properties  include  weight,  "transfer 
effect"  moments  of  inertia,  center  of  gravity  to  the  reference  and  volume. 
"Transfer  effect"  is  the  sum  of  the  products  of  the  weight  and  distance 
squared  of  each  weight  element  of  the  item  from  its  own  center  of  gravity. 
The  transfer  effect  is  an  intermediate  quantity  required  to  determine  the 
total  moment  of  inertia  of  the  shell.  This  quantity  is  useful  to  know  if 
revisions  by  hand  are  to  be  made  on  a  shell  after  the  computer  has 
calculated  its  properties. 

The  fourth  part  of  output  gives  the  properties  of  the  entire  shell;  the 
total  weight,  moments  of  inertia,  and  the  center  of  gravity.  The  center 
of  gravity,  besides  being  printed,  is  also  indicated  on  the  graph  of  the 
shell,  as  can  be  seen  on  Figure  4. 

The  fifth  part  of  output  is  the  "Subtotal  Sheet.  "  For  any  piece  on 
the  subtotal  sheet,  the  properties  given  are  the  sum  of  all  those  proper¬ 
ties  for  all  the  preceding  pieces.  This  feature  is  very  useful  if  revisions 
are  to  be  made  on  the  shell.  It  enables  the  user  to  perform  a  sectional 
analysis  so  that  alterations  to  any  piece  or  group  of  pieces  to  achieve  a 
certain  total  weight,  moment  or  volume  is  greatlv  simplified. 

Figure  5  is  an  illustration  of  a  shell  which  contains  ogive  pieces. 

The  data  for  ogive  pieces  is  particularly  error  prone.  Very  often  the 
center  or  direction,  of  the  arc  has  been  incorrectly  determined.  The 
graph  of  the  ogive  is  usually  sufficient  to  point  out  these  errors. 

Figure  6  is  an  illustration  of  a  shell  which  contains  an  input  data  error. 
This  error,  occurring  between  heights  16  and  17,  is  clearly  visible  and 
eliminates  the  necessity  of  checking  the  almost  200  input  cards  needed 
for  this  run. 

In  order  to  run  this  program,  three  input  cards  are  needed,  followed 
by  the  body,  fin,  ogive  and  known  items  (one  card  per  item). 

The  first  card  is  used  for  a  title.  The  information  written  on  this 
card  will  appear  on  the  output  sheets  and,  if  desired,  on  the  graph  as  well. 

The  second  card  is  the  option  control  card.  Here  are  given  options 
governing  five  general  areas. 

1.  Graph  or  no  graph  output; 

2.  Scaling  on  the  graph,  which  has  already  been  described; 


3.  Size  of  the  graph.  This  option,  if  specified,'  will  cause  an 

11  by  11  inch  graph  to  be  produced.  Of  course,  in  this  case, 
reasonable  scaling  must  be  forsaken  for  size; 

4.  A  format  option.  Normally,  the  field  width  for  each  dimension 
is  10.  However,  with  this  option  it  is  possible  to  punch  the 
dimensions  using  no  field  width,  but  instead  by  separating  each 
number  by  a  comma.  Also,  whole  numbers  need  not  have 
decimals  and  E-type  numbers  are  acceptable;  and 

5.  The  last  option  concerns  dimension  change  in  subsequent  runs. 

Often,  especially  when  a  shell  is  in  the  design  stage,  it  is  important 
to  know  what  happens  when  certain  dimensions  are  varied,  deleted  or 
added.  This  option  will  cause  the  computer  to  hold  all  input  values  after 
the  first  run  and  then  to  pick  up  any  deletions,  changes  or  additions  on  the 
second,  third,  fourth,  etc.  ,  runs. 

If  the  option  control  card  is  left  blank,  the  field  width  format  is  set 
at  10,  a  one  unit  to  the  inch  graph  will  be  produced  and  the  program  will 
consider  each  run  independent. 

The  third  card  of  input  is  the  Index  Card.  Here  is  given  the  number 
of  body,  fin,  known  and  ogive  items.  Also,  the  number  of  pieces  each 
fin  is  sectioned  into  is  given,  as  well  as  the  total  number  of  copies  of 
output  and  subtotal  sheets  desired.  Certain  values  on  this  card  may  be 
left  blank,  if  desired. 

For  example,  if  the  number  of  body  pieces  is  not  specified,  the  pro¬ 
gram  will  scan  the  next  card  for  a  "B"  which  means  that  the  following 
cards  are  body  items.  The  body  items,  in  this  case,  will  be  terminated 
with  a  blank  card. 

If  no  "B"  is  found,  the  program  will  assume  that  there  are  no  body 
items  in  the  run.  The  same  holds  true  for  ogive,  fin  and  known  items. 

This  feature  eliminates  the  necessity  of  counting  the  number  of  pieces 
(and  hence  cards)  in  any  one  group. 

In  conclusion:  this  shell  program  is  not  particularly  new  to  Picatinny 
Arsenal.  It  has  been  in  use  since  1964  and  seems  to  be  very  useful  in  both 
designing  and  evaluating  artillery  shells.  Its  output  is  readily  acceptable 
by  other  computer  programs  on  the  Arsenal  such  as  trajectory  and  stability 
programs. 
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It  was  written  in  FORTRAN  IV  for  the  IBM  7090,  originally,  and  has 
since  been  converted  for  use  on  the  IBM  360,  Models  "40"  and  "65".  The 
plotter  used  is  CALCOMP,  Model  570/563,  magnetic  tape.  The  program 
is  fully  described  in  a  Picatinny  Arsenal  Technical  Report,  Number  3327. 
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ZERPOL,  A  ZERO  FINDING  ALGORITHM  FOR  POLYNOMIALS  USING 
LAGUERRE'S  METHOD* 


Brian  T.  Smith 

Department  of  Computer  Science 
University  of  Toronto 


ABSTRACT, 
polynomial  P(z) 


ZERPOL  is  a  subroutine  which  computes  the  N  zeros  of  the 
when  given  just  its  real  coefficients  A(I)  : 


P(z)  =  A(1)zN  +  A(2)zN_1  +  ...  +  A(N)z  +  A(N+1)  . 


The  zeros  are  stored  in  the  complex  array  Z  with  the  complex  zeros  appearing 
in  complex  conjugate  pairs.  Except  for  polynomials  of  degrees  one  and  two, 
ZERPOL  iterates  towards  a  zero  using  Laguerre's  method,  which  is  cubically 
convergent  for  isolated  zeros  and  linearly  convergent  for  multiple  zeros. 

The  maximum  length  of  the  step  between  successive  iterates  is  restricted 
so  that  the  iterate  xj+2  lies  inside  a  certain  region  about  the  iterate  xj 
proved  to  contain  a  zero  of  the  polynomial.  An  iterate  is  accepted  as  a 
zero  when  the  polynomial  value  at  that  iterate  is  smaller  than  a  computed 
bound  for  the  rounding  error  in  the  polynomial  value  at  that  iterate.  The 
original  polynomial  is  deflated  after  each  real  zero  or  pair  of  complex 
zeros  is  found,  and  subsequent  zeros  are  found  using  the  deflated  polynomial. 


INTRODUCTION.  The  problem  is  to  find  the  N  zeros  Zj  of  the  given 
polynomial  N  „  . 

P(z)  -  jSo 


that  satisfy 


N 

P(z)  =  Uq^  (z-Z^)  • 


The  algorithm  ZERPOL  is  intended  to  solve  this  problem.  The  algorithm  is 
described  under  two  sections.  Section  one  gives  a  summary  of  the  strategy 
used  and  section  two  describes  some  of  the  pertinent  details  about  the 
implementation  of  this  strategy  in  FORTRAN  IV  on  an  IBM  7094-11. 


Laguerre's  method  is  defined  now:  Starting  with  an  arbitrary  complex 
point  x0,  Laguerre's  method  generates  a  sequence  of  iterates  (xj)  for  the 
polynomial  P(z)  given  by 


x 


3+1 


h  +;f(V 


*"The  program  Zerpol  discussed  in  this  article  was  developed  under  the 
direction  of  Professor  William  M.  Kahan,  University  of  Toronto,  Toronto, 
Canada.  This  material  was  presented  at  the  Conference  by  Professor  Kahan 
who  described  the  rationale  for  the  program  Zerpol  described  here  by  Mr. 
Smith."  The  next  article  in  these  Proceedings  was  submitted  by  Dr.  Kahan 
and  is  intended  to  support  the  material  in  this  article. 


where  *(  Xj)  is  the  Laguerre  step  at  x^ 


and  equals 


-N  P(x  ) 

P'(Xj)  +V(N-1)2  P'Uj)2  -  N(N-l)  P(Xj)  P"(x^) 

the  +  sign  being  chosen  so  as  to  maximize  the  denominator's  magnitude.  (See 
Wilkinson  (1965)  for  a  development  of  Laguerre 's  method.) 

SUMMARY  OF  THE  STRATEGY  USED  IN  ZERPOL.  The  overall  strategy  of  ZERPOL 
is  described  now.  Polynomials  of  degree  N  <  2  and  polynomials  whose  leading 
or  trailing  coefficients  vanish  are  treated  separately.  The  coefficients 
of  the  polynomial  are  scaled  upward  as  far  as  possible  so  that  spurious 
underflow  does  not  occur  when  the  polynomial  is  evaluated  near  a  zero.  ZERPOL 
first  attempts  to  start  the  iterative  procedure  at  the  origin.  If  the  origin 
is  not  an  acceptable  initial  iterate,  trial  initial  points  in  a  certain  annu¬ 
lar  region  around  the  origin  are  tested  until  a  suitable  Initial  iterate  is 
found.  Subsequent  Iterates  are  restricted  in  order  that  the  modulus  of  the 
polynomial  decreases  from  one  iterate  to  the  next  iterate  and  that  the 
distance  between  successive  iterates  is  not  too  large.  The  sequence  of 
Iterates  terminates  when  the  modulus  of  the  polynomial  becomes  negligible. 

The  polynomial  is  deflated  by  the  final  iterate  and  the  iteration  procedure 
is  repeated  using  the  deflated  polynomial. 

Specific  details  of  the  strategy  are  described  now.  The  zeros  of  poly¬ 
nomials  of  degree  N  <  2  are  computed  using  the  standard  closed  formulas.  The 
quadratic  equation  solver  subroutine  QDRTC  (A,B,C,ZS,ZL)  is  used  to  compute 
the  complex  roots  ZS  and  ZL  of  any  real  quadratic  equations 

Az2  +  Bz  +  C  =  0 

that  must  be  solved  by  ZERPOL.  Unless  over /under flow  occurs,  the  real  and 
complex  components  of  ZS  and  ZL  are  computed  within  an  accuracy  of  2.25 
units  in  their  last  place,  and  |zs|  <  | ZL |  within  the  specified  accuracy  of 
these  roots.  Overflow  and  underflow  occurs  only  when  the  exact  roots  over¬ 
flow  or  underflow. 

For  the  remainder  of  this  description  we  assume  that  the  N  +  1  real 
coefficients  u.  are  given  for  the  polynomial 

3  N  N-  •  \f 

P(z)  =  jIQujz  ^  s0  that  uo  t  0  and  N  >  3  . 

(Whenever  u  =0,  the  zero  zN  is  set  to  the  largest  number  in  the  machine, 
an  overflow  message  is  enabled  and  the  polynomial  P(z)  is  treated  as  a 
polynomial  of  degree  N-l.  If  u^  =  0,  z^  is  set  to  zero  and  the  polynomial 
P(z)  is  treated  as  a  polynomial  of  degree  N-l.);> 
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First, 


{  _ 

the  coefficients  u.  are  scaled  so  that 


101 

max  |u  |  >  2 
<  j  <  N  J 


Scaling  the  coefficients  in  this  manner  reduces  the  possibility  of  underflow 
of  P(z)  near  a  zero.  However  the  underflow  condition  cannot  be  completely 
eliminated  as  shown  by  the  following  example: 


f128  (z61-z60>  +  2126<z31-z30) 


+  2"128(z-1) 


This  polynomial  cannot  be  evaluated  near  any  of  its  zeros,  namely  1  and 
9 +127/15  (2k+l)7ri/30  r  ,  ,  „ 

e  f°r  k-l,2,...30,  without  using  numbers  that  overflow 

21  or  underflow  2-129  f  the  limits  on  the  7094-11. 


Next,  an  annular  region  about  the  origin  known  to  contain  the  smallest 
zero  of  the  polynomial  is  computed.  The  radius  of  the  inner  circle  is  the 
Cauchy  lower  bound  R,  namely  the  positive  zero  of  the  polynomial. 

N_1  M_4 

S(z)  ■  Z  |u. I  z  ^  -  |u  I  . 

J=0  1 2  j1  1  N1 


The  radius  of  the  outer  circle  is  the  minimum  of  the  geometric  mean  G= 
lUN^Uo^N  °f  the  ma8nitude  °f  th«  zeros,  the  Fejer  bound  |f|  ,  the  Laguerre 

bound  \£\  and  the  Cauchy  upper  bound.  Details  concerning  the  computation 

of  these  bounds  will  be  given  later. 


This  annular  region  known  to  contain  a  zero  of  the  polynomial  is  used  to 
find  an  acceptable  initial  iterate  for  the  iterations  procedure.  The  strategy 
is  first  to  attempt  to  start  the  iteration  procedure  at  the  origin.  The 
origin  is  accepted  as  an  initial  iterate  whenever  the  Laguerre  step  from  the 
origin  lies  within  the  outer  circle  of  the  annulus.  Otherwise  the  origin  is 
unacceptable  as  an  initial  iterate  and  a  search  of  this  annular  region  for 
an  Initial  iterate  is  started.  A  trial  point  x  in  this  annular  region  is 
accepted  as,  an  initial  iterate  whenever  the  nex?  iterate  x..  =  x  +2tf(x  ) 
roughly  lies  within  the  annulus.  The  trial  points  lie  on  four  equiangular 
spirals  about  the  origin  starting  on  the  inner  circle  of  this  annular  region. 

Once  a  suitable  initial  iterate  has  been  found,  subsequent  iterates  are 
determined  by  the  following  conditions:  for  j=0,  1,... 

(1)  xj+i  =  xj  +  »  and 

|P(x.)|  >|P(xj+1)| 

where  L(x^)  may  be  a  modified  Laguerre  step,  and 

(2)  xj+i  "*"“^xj+l^  rou8hly  lies  inside  a  circular  region  about  the 
iterate  x^  of  radius  |f|  known  to  contain  a  zero  of  P(z)  (i.e. 
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ir<  Xj+^)  |  <  | F J ) ,  and  the  modified  Laguerre  step 

L(xj+1>  when  |JGCx^+1)  |  <  |f|/2 

J  |F|/2).aJxj+1)/|5!5xj+1)| 

[  when  |f|/2  <  l-&xj+1)  I- lFl  • 

The  modified  Laguerre  step  may  be  further  modified  when  condition  (1)  is  not 
satisfied.  If  | P (x^  +  L(Xj))] > | P (x^ )  |  then  L(x^)  Is  replaced  by  L(x^)/2  and 

the  condition  (1)  is  retested.  This  process  is  repeated  until  condition  (1) 
is  satisfied.  If  ^(xj+1)  is  t0°  large  (that  is,  ( |»^(xj+1)  |>  |f|)  then  L(xj) 

is  again  replaced  by  L(Xj)/2  and  conditions  (1)  and  (2)  are  retested.  The 

process  is  repeated  until  both  condition  (1)  and  (2)  are  satisfied.  (These 
conditions  are  based  on  theorems  due  to  W.  Kahan.  See  also  B.T.  Smith  (1967).) 

The  iteration  procedure  stops  whenever  the  polynomial  value  at  an  iterate 
becomes  smaller  than  a  bound  on  the  rounding  error  in  the  polynomial  value 
computed  at  that  iterate.  For  a  real  point  X  ,  we  can  show  that  a  bound  for 
the  rounding  error  in  the  computed  value  of  P(X)  using  the  Newton-Homer 
recurrence  is  given  by 


j  P (X)  -  Qn|<  o  c  E 


where 


(1)  The  numbers  Q  for  j  -  0,...N  are  the  computed  values  for  q.  obtained 


from  the  Newton-Horner  recurrence; 


for  j  -  1, . .  .N 


q  ■  u  and 
^o  o 

qj  "  uj  +  Vi  * 


(2)  E  -  jE,  (Qj  |  |X|-J  , 

(3)  o  equals  a  unit  in  the  last  place  in  the  arithmetic  used  to  compute 
Qn  »  and 

(4)  c  is  machine  constant  of  the  order  of  10  for  IBM-7094-11  representing 
the  roundoff  errors  in  the  arithmetic  used  to  compute  . 

Since  a  zero  of  P(z)  need  not  be  representable  in  the  machine,  we  really 
want  a  bound  for  |P(x)-QN|  where  x  is  in  the  neighborhood  of  X  ,  that  is 

|  x— X  J  <  |  X  |  o.  We  can  show  that  whenever  |  x— X  j  <  a  |  X  |  then  |p(x)-P(X)|<  oE 
80  that  |F(x)-Qn|  <  c(c+l)E 
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We  summarize  the  results  of  this  error  analysis  by  saying  that  we 
distinguish  any  point  X  for  which  |Q^|  <  a  (c+1)  E  from  a  zero  of 

nomial  P(z),  and  that  the  machine  representable  numbers  which  are 
neighbours  of  a  zero  x  of  P(z)  satisfy  |Q^|  <  a  (c+1)  E. 


cannot 
the  poly- 

immediate 


The  numbers  q.  are  the  coefficients  of  the  quotient  polynomial  Q(z)  in 
division  of  P(z)  by  the  factor  z-X.  That  is, 

N  „  . 

P(z)  =  .1  U.  Z  -1 
J"°  J  M  1 

=  (z-X)  .£  q.  zN"1-:i  +  q 

j-o 


=  (z-X)  Q(z)  +  qN  . 


Therefore  the  first  derivative  of  P(z)  at 
Newton-Horner  recurrence  to  coefficients 

Thus 


X  can  be  obtained  by  applying  the 


Q(z)  =  (z-X)  W(z)  +  wN_x  and 

P'(X)  =  wN-1  . 

Similarly  for  the  second  derivative  P"(x)  , 


W(z)  =  (z-X)  V(z)  +  vN2  and 


P"(z) 


v 


N-2  ' 


Notice  that  the  error  hound  E,  the  polynomial  value  and  its  derivatives  can 
all  be  computed  within  the  same  loop. 


The  evaluation  of  the  polynomial  value,  its  derivatives  and  the  error 
bound  E  at  a  complex  iterate  Z  is  obtained  in  a  similar  manner  to  the  real 
iterate  X  by  replacing  each  occurrence  of  the  linear  factor  (z-X)  by  the 
real  quadratic  factor  (z-Z)  (z-Z)  where  Z  is  the  complex  conjugate  of  Z. 
(This  evaluation  procedure  for  complex  points  appears  in  Wilkinson  (1965), 
page  447-449.) 


Once  an  iterate  is  accepted  as  a  zero,  the  coefficients  qj  of  the 
quotient  polynomial  Q(z)  replace  the  coefficients  u^  and  the  iteration 


process  is  repeated  on  the  deflated  polynomial.  Purification  of  the  zeros 
is  not  attempted  by  ZERPOL. 


PROGRAMMING  DETAILS  FOR  ZERPOL  STORAGE  ALLOCATION.  The  coefficients  of  the 
polynomial  are  transferred  to  the  double  precision  array  DU.  This  array  DU 
is  placed  in  COMMON  with  library  workspace  L1BWSP  so  that  the  workspace  need 
not  be  supplied  by  the  user.  This  places  a  restriction  N  <  79  on  the  maximum 
degree  of  the  polynomial  handled  by  ZERPOL.  However,  the  restriction  can 
readily  be  eliminated  by  increasing  the  dimension  of  library  workspace  in  the 
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calling  program  to  at  least  2N  +  2  where  N  is  the  degree  of  polynomial.  Notice 
that  the  double  precision  leading  coefficient  DUO  can  be  referenced  DU(IO) 
where  10  *  0  . 

The  complex  array  Z  of  zeros  is  treated  inside  ZERPOL  as  a  double  precision 
array  so  that  those  elements  of  the  array  Z  which  do  not  contain  zeros  of  the 
polynomial  may  be  used  to  store  temporarily  the  coefficients  of  the  quotient 
polynomial.  The  coefficients  of  the  quotient  polynomial  are  transferred  to 
to  DU  array  whenever  an  iterate  is  accepted  as  a  zero. 

All  diagnostic  messages  initiated  from  ZERPOL  appear  in  DATA  statements 
and  are  issued  through  the  subroutine  UNCLE. 

INITIALIZATION .  The  function  subroutine  CI12(N)  converts  the  integer 
N  from  its  binary  representation  to  its  binary  coded  data  (BCD)  representation 
in  order  to  appear  in  the  diagnostic  messages  given  by  UNCLE. 

A  warning  message  is  generated  when  N  >  79.  However  this  message  is 
suppressed  for  all  subsequent  calls  of  ZERPOL  in  the  same  job. 

Over/underflow  variables  OVFLOW  and  UNFLOW  are  saved  from  the  user's 
program.  The  statement  NSAV=NFPTST(0)  suppresses  any  messages  for  the  over / 
underflow  occurring  in  ZERPOL.  (See  Programmer's  Reference  Mannual  (PRM) , 
(1964).) 

SCALING.  If  N  <  2,  or  max  |DU(J)|>  2^*  the  coefficients  are  not 

o<J<N 

scaled.  Otherwise  the  coefficients  are  scaled  by  the  scale  factor  DSC  so 
that  the  max  |DU(J)  |  =  2*01.  The  scaling  procedure  i  executed  in  the 
o<J<N 

unnormalized  mode  (CALL  FPTUN)  in  order  to  extend  the  allowed  lower  limit 
to  the  magnitude  of  the  coefficients.  As  a  result,  ZERPOL  can  confidently 
ignore  underflow  except  when  underflow  occurs  in  the  first  and  last  coef¬ 
ficients.  The  standard  mode  is  re-instated  with  CALL  FPTST. 

Overflow  may  occur  in  the  evaluation  of  the  polynomial  and  it'’  derivatives. 
When  overflow  does  occur,  we  attempt  to  remove  the  overflow  condition  by 
scaling  down  the  coefficients  by  2“27, 

.  If  the  leading  coefficient  becomes  unnormalized  in  the  process  of  scaling 
down  the  coefficients,  a  message  is  given  stating  that  the  polynomial  cannot 
be  evaluated  near  some  of  its  zeros  without  over/underf low. 

THE  ANNULUS  CONTAINING  THE  SMALLEST  ZERO.  RilzkG*.  The  geometric  mean  G 
of  the  magnitudes  of  the  zeros  is  computed  using  logarithms  in  order  to  prevent 
over/underflow  of  the  Intermediate  results. 

The  reciprocal  of  the  Newton  step  at  the  origin  is  checked  for  overflow. 

If  it  overflows,  a  zero  is  close  enough  to  the  origin  to  be  considered  as 
zero.  Also  | P *  (0) /P(0)  |  <  2+127  ensures  that  the  Cauchy  lower  bound  R  doesn't 
underflow. 
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The  Fejer  bound  at  the  point  X  is  the  magnitude  of  the  zero  F  of 
smaller  magnitude  of  the  quadratic  equation 

(P"(X)/(2N(N-1)))F2  +  (P'(X)/N)F  +  P(X)/2  =  0. 

The  Laguerre  step  df(X)  is  simply  related  to  the  zero  F  by  the  formula 

«ST(X)  =  F/((N-2)  P'(X)  F/  (N  P(X))  +  N-l)  . 

The  values  of  the  polynomial  and  its  two  derivatives  at  the  origin 
are  given  by  the  coefficients  (DU(N)  ,  DU(N-l)  and  DU(N-2).2  .  The  Fejer 
bound  is  computed  using  the  subroutine  QDRTC  and  the  Laguerre  step  and 
Laguerre  bound  at  the  origin  are  computed  immediately.  Thus 

B  =  1.0001  min  (  /~N<sf(o),  | F |  ,  G) 

is  an  upper  bound  for  the  magnitude  of  the  smallest  zero  of  the  polynomial. 

Next,  the  Cauchy  lower  bound  R  for  the  smallest  zero  is  computed  where 
R  is  the  positive  zero  of  the  polynomial 


s(z)  =  I|o  |DU(I)  |  zN  1  -  |DU(N)  |  . 

This  zero  R  can  readily  be  computed  using  the  Newton-Raphson  method  with 
xq  =  B  because  all  the  derivatives  of  S(z)  are  positive  for  z  positive. 

Notice  that  for  X  >  R 

X  S' (X)  >  |DU(N)  |  , 

SO  -129 

S'(X)  >  2  . 


Therefore  we  do  not  expect  S'(X)  to  underflow.  The  sequence  of  iterates 
(Xj)  terminates  when  x^+^  >  x^  for  the  first  time.  If  overflow  has  occurred 

in  the  computation  of  the  last  iterate,  that  iterate  is  probably  incorrect 
and  can  be  corrected  easily  only  by  scaling  down  the  coefficients  of  the 
polynomial.  If  no  overflow  occurs,  0.99999  x4  is  accepted  as  a  lower  bound 


j 


for  the  magnitude  of  the  smallest  zero  of  the  polynomial.  _  ... 

is  an  upper  bound  for  the  smallest  zero  of  P(z)  and  R/(2^^-l)<  N(1.445)  R, 


Since  R/(21/N-l) 


then  G'  =  min  (B,  N(1.445)R)  is  accepted  as  an  upper  bound  for  the  magnitude 
of  the  smallest  zero  of  the  polynomial  P(z). 


THE  ITERATION  PROCEDURE.  The  strategy  of  this  section  of  the  algorithm 
has  been  described  previously  in  section  one.  To  assist  the  reader  in  follow¬ 
ing  the  FORTRAN  code,  STARTD  and  SPIRAL  are  logical  variables  indicating 
whether  or  not  the  iteration  procedure  has  started  successfully  and  whether 
or  not  a  spiral  search  for  an  initial  iterate  has  started. 


Laguerre' s  method  may  be  exact  for  zeros  of  multiplicity  N-l  and  N 


so  that  the  initial  iterate  from  the  origin  is  allowed  to  reach  the  outer 
circle  of  the  annulus  R  <  |z|  <  G'  whenever  this  annulus  is  relatively 
narrow  (i.e.  R  >G * / 2  . 


The  time  required  to  compute  the  value  of  the  polynomial  and  its 
derivatives  at  a  real  point  is  less  than  the  time  at  a  complex  point  so 
that  an  Iterate  is  forced  to  be  real  whenever  the  imaginary  part  of  the 
iterate  x  is  less  than  one-fifth  of  the  step  x  -  x^_^  to  that  iterate. 


POLYNOMIAL  EVALUATIONS.  The  polynomial  value  and  its  first  derivative 
are  computed  using  double  precision  arithmetic  while  the  second  derivative 
is  computed  with  single  precision  arithmetic.  We  felt  that  the  improved 
convergence  to  rare  multiple  zeros  was  not  worth  the  cost  in  extra  time 
of  computing  the  second  derivative  with  double  precision  arithmetic.  The 
unnormalized  mode  is  used  for  the  above  computation. 


The  evaluations  of  the  polynomial  and  its  derivatives  at  a  real  iterate 
and  at  a  complex  iterate  are  done  in  separate  blocks.  The  computation  in 
the  case  of  a  real  iterate  is  straightforward.  However,  precautions  need 
be  taken  when  the  magnitude  of  a  complex  iterate  is  extremely  large  or 
small. 


In  the  case  of  a  compxex  Iterate  X  ,  the  squared  modulus  of  the  complex 
Iterate  appears  in  the  quadratic  factor  and  so  may  over/underflow.  Thus 
whenever 

63  5 

| X (  >2  *  ,  (square  root  of  overflow) 

ii  6 '  5 

or  | X  |  <2  ,  (square  root  of  underflow) 

then  the  coefficients  of  the  quadratic  factor  (z-X)(z-X)  are  carefully 
scaled  so  that  the  possibility  of  overflow  or  underflow  in  the  evaluation 
loop  is  minimized. 

If  overflow  cannot  be  avoided  in  the  evaluation  loops  the  coefficients 
are  scaled  down  by  2" ^7  . 

If  the  modulus  of  the  polynomial  is  greater  than  the  error  bound  in 
the  computed  value  of  the  polynomial,  and  the  modulus  of  the  polynomial 
underflows,  then  a  message  is  given  declaring  that  over /under flow  occurs 
in.  the  evaluation  of  the  polynomial  near  one  of  its  zeros.  The  last  iterate 
is  accepted  as  a  zero  of  the  polynomial. 

If  the  reciprocal  of  the  Newton  step  at  the  last  iterate  overflows, 
then  the  last  iterate  is  within  a  distance  of  N  2”^7  of  a  zero  of  the 
polynomial.  The  last  iterate  is  accepted  as  a  zero  of  the  polynomial  but 
underflow  is  signalled. 

SEARCHING  THE  ANNULAR  REGION  FOR  AN  INITIAL  ITERATE.  The  search  for 
an  acceptable  starting  point  for  the  iteration  procedure  starts  with  a 
point  on  the  inner  circle  of  the  annulus  in  the  direction  of  -the  Laguerre 
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step  from  the  origin.  Subsequent  trial  points  lie  on  the  spirals  traced 
out  by  R  (i  -  1.25N)k  for  k  =  0,1,...  where  the  angle  between  successive 
trial  points  is  -tan"-'-  (N/1.25)  or  just  more  than  -  90°.  If  every  fourth 
trial  point  is  examined,  the  locus  is  a  spiral  progressing  in  a  counter¬ 
clockwise  direction.  The  constant  1.25  is  chosen  in  the  hope  that  the 
distribution  of  the  trial  points  is  dense  enough  in  the  annular  region  to 
find  an  initial  iterate  but  not  so  dense  that  a  great  deal  of  time  is  spent 
searching  for  a  suitable  initial  iterate. 

TEST  RESULTS.  ZERPOL  was  tested  with  polynomials  given  in  papers  by 
P.  Henrici  and  B.O.  Watkins  (1964)  and  E.H.  Bareiss  (1965).  In  all  cases 
ZERPOL  satisfied  our  criteric*.  for  the  accuracy  of  the  zeros,  namely  that 
the  coefficients  of  the  polynomial  reconstructed  from  the  zeros  given  by 
ZERPOL  closely  resemble  the  original  coefficients. 

2 

ZERPOL  computes  all  zeros  of  a  polynomial  of  degree  N  in  roughly  N 
milliseconds  on  our  IBM-7094-11  and  consists  of  approximately  550  cards. 

We  compared  ZERPOL  with  the  package  of  subroutines  catalogued  in  1965 
as  SDA-3332  in  the  SHARE  library.  This  routine  found  the  zeros  of  the 
test  polynomials  taking  from  two  to  five  times  longer  than  ZERPOL.  We 
also  compared  results  from  ZERPOL  with  the  subroutine  POLRT  from  the  IBM 
System/360  Scientific  Subroutine  Package  (1966).  This  subroutine  is  about 
as  fast  as  ZERPOL,  but  sometimes  gives  wrong  answers. 

The  following  table  gives  some  statistics  on  the  number  of  steps 
required  to  find  all  the  zeros  of  polynomials  of  varying  degrees.  The 
coefficients  of  these  polynomials  are  random  numbers  taken  from  a  normal 
distribution  with  mean  0  and  variance  1  . 

Laguerre  steps  Search  steps  Half  steps 

No.  of  per  iterated  zero  per  iterated  zero  per  iterated  zero 


Degree 

Polynomials 

Average  Maximum 

Average  Maximum 

Average  Maximum 

3 

100 

3.9 

6.0 

0.3 

2.0 

0.02 

5.0 

6 

28 

4.5 

6.0 

0.54 

3.7 

0.14 

4.0 

12 

7 

4.7 

5.2 

0.55 

4.3 

0.25 

1.7 

18 

4 

5.0 

5.5 

1.20 

4.3 

0.46 

2.0 

This  version  of  ZERPOL  was  produced  during  the  author’s  work  for  an 
M.Sc  degree  at  the  University  of  Toronto  under  the  supervision  of  W.  Kahan, 
with  the  support  of  a  Province  of  Ontario  Fellowship. 
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APPENDIX  -  ZERPOL  LISTING 


SUdKUUIlNt  ZtKPUL  (  A  ,  N  0 1 G  ,  Z  ) 

CALL  ZtKPUL  (  A  ,  N » Z  )  TO  S£T  Z  (  I  )  =  I-'fH  ZtKU  UP  THt  HULYiMLMIAL 
A  ( 1  )  *Z*':H  +  A  (2  )*Z**(  t\i- 1  )  +  ...  +  A(im')*Z  +  AIim+1) 
i\UT£  THAI  in  =  UtGKtt  Up  IHt  POLYNOMIAL,  AND  lHtKt  AKt  N+l  KtAL 

f.Urr  P  Ill  I  Pr-l  I  S - M_ LL. - LI  livit  AlS-1  i  INS  -  - - tU-AL _ a  (  AT  t  P  a  kT _ cj.tl_) _ 

CUwHLfcX  Z(  AT  LfcAST  iv*  ) 

NUKHALLY »  IN  SHOULD  NUI  tXCttD  79.  UTH£K'.NlSt  THt  PBUGKAHi'ltK  SHUULU 
iNCLUOt  CUni’iU'M  /LlttwSP/  LltH<SP(  AT  LtASl  2*in  +  2  ) 

USt  HULi-iY  ru  CHfcCK  ACCURACY  . 


CUciPLtX  CONJUGATt  ZtKUS  Z  (  I)  UCCUK  CUixScCul  I  VtLY  ,  I.t. 

i  h  (  zd)  is  cuHT’Lt  x  i  1 1  r  h£k  zii+ii  =  cuiMjGizim 

(JK  tLSt  Z(I-l)  =  CuimJGIZI  I  )  )  . 

IP  ALL  CUtPPICIENTS  A ( I ) =  0  ,  THt  O.U/O.U  DIAGNOSTIC  IS  PKUOUCtU. 

^  T  V  V  *1*  5(»  V  ^  V  V  V  V  't'  V  *i»  *«*  *1*  *1'  »«*  *1*  •«»  ^1*  V  *1*  V  V  V  ***  *.»  *,?  «(•  V  •«»  •«»  *,*  v  v  V  V  V  »|t  5|»  »,•  V  V  »t»  »,!  «[(  y  ijt  !,«  y  y  i,»  ]JC  V  V 

000  CunTIimUp 

KtAL  A ( bO ) 

C  CunPLtX  Z ( 79 ) 

UUUBLt  PKcCISIUN  Z ( 7 9 ) 

J£ _ 

C  UU  (  I  )  IS  THt  CutPPICItNI  UP  Z**(n-1)  In  THt  CUKBtNl  PUL  Y'NUHl  AL  • 

OUUBLe  PKtClSIUN  UUO  ,  00(79) 

CUpihUN  /LlrirtSP/Lld*SP( 160) 

tOUlVALtNCt  (LIBwSP  ,  DUO  ),  (  L1BWSPI3),  DU  ) 

C 

_ LiiiiKAi. _ uv£»  IiHj-1  SAVUt  S A V 1 J i  STAKTD,  SHiKAi _ 

LOGICAL  UVrLovV  ,  UlMHLOK  ,  TuualG 

C  U  hi  'Hi  J  in  /UVPLOW/uVrLuw  ,  /UnP LUn / UnP LDP 

DATA  TUUdlG/.TKUt./ 

C 


01 nfcoS I  On 
CumPLpX 

ACPI ( 2  ) , 
CPI, 

ACP2I2  ) 
_ Cr 2 « _ 

,  aCp( 2  ) 

_ CP _ 

r 

tOU I YALt^C t 

(Cl-l,  ACPI  ) 

1  (CP2,aCP2),  (CP,aCP) 

.  1* 

CU.»iPLtX 

CUlKO  , 

CSPlK 

0 I H  t  NS  I  UN 

ACUlK ( 2  )  , 

AC ( 2  )  , 

ACL  ( 2  ) 

ClJ  1-iPLtX 

Cl)  I  K  f 

C, 

CL 

_ t-OUl.VAL.ti^C-t_ 

(C')iK.  ACUIKi 

t  ■ (C i aC  )  > 

..ICLtrtCL ) _ 

C 

.UUUBLt  PKtCI S ION  UZNK,  UZnI,  UZOK,  UZO I 
UUUBLt  PKtClSIUN  OX,  UK,  USC,  UY,  UX2,  UV 

frUUlVALt-iCt  (UX,X),  (  0  K ,  K  )  ,  (DSC, SC),  )UY,Y),  (DX2,X2),  (  DV  ,  V  ) 

UUUBLt  PKtClSIUN  01  ,  DTI 


ooooot_Hooooo 


HAS  NU  Z 


L)  1 1*1  fc  l\t  s  I  u  N  MfcSSH(lU) 

OATA  MESSH(l)/  34H0  A  POLYNOMIAL  OF  DEGREE 
StKUS •  /,  MtSSH(lU)  /  U777777777V77/ 

CUnPLEX  CMESSH 

_ bULIlMALtlNCE _ t  C  m  F  S  S  H .  ih=»hIH> _ 

C 

DIMENSION  MtSH ( 22 ) 

DATA  MfcSH(l)  / 1 26H0  f HERE  IS  SUl»iE  REASON  Tu  BELIEVE  lhAT  THE  FIRST 
$  ZERUS  ARE  INCORRECT.  QUICKLY  CALL  W.  KAHAN  UK  B. 

sr.  smith.  /,  mesh(22)/  0777777777777  / 

i _ CUmPLEX  CmFSH _ 

EQUIVALENCE  (CMfcSH,rttSH(  9)  ) 

.  c 

UImEiniSIUim  HESS  (lb) 

•  DATA  MtSS(lb)  /  0777777777777  /,  MESS(l)  / 

S102H0YUUK  IM  =  EXCEEDS  79  ,  AND  KEOUIRES  THE  DImEN 

_ SSIUN  UF  LIBWSP  TO  BE  AT  LEAST  2»iM+2  .  / _ 

COMPLEX  CmeSS 

EQUIVALENCE  CCmESS, MESSI3) ) ,  (FInITY,  m£SS(18)) 

C 

OImEnSIUN  UVFUNF ( 16 ) 

DATA  UVFUnF( 1 6 )/ 07 77 7 7  7 777 7 7 7/ ,  UVFUNF(l)/ 

_ ?.9ilH,0Zb.8PllL  CANNOT  tVALUAi'E  r*1.E  GIVEN  PULYllUiMl  AL_1^£AK..  SUJ^E  UF  ITS  L 

SEKUS  WITHOUT  OVER/UNUERPLOW  / 

.C 

OATA  BIT  /U400000000/,  TM27  /U1434U000000U/ ,  10/0/, 

.$  T633/U300332023623/,  T101  /0346400000000/ , 

J  TM64S/0100332023623/ 

_C _ 

61 T=2.**-129=SmaLLEST  NU.  T101=2.#*101  Trt2 7=2 . #*-27 
FInITY=-2.**127=-LAKGESI  NU.  TM643=2**  (  -64. 3  ) 

T633=2**( 63.3 ) 


Sl 

c 

c 

c 

L 

C 


SPECIAL  FUNCTIuNS- 

A LOG 2  (  X  )  =  LOGARITHM  UF  X  TU  THE  BASE  2. _ . _ 

TwOXP(X)  =  2 . **X 

CI12IJ)  =  ALPHABET  IC  REPRESENTATION  UF  J  IN  112  bURmAT  (CMPLX) 
AND(X.Y)  LOGICALLY  'ANUS'  X  AND  Y  BIT  BY  BIT  . 

AmAXA  (  T  »  I  »  J  » K  ,  L  )  HINDS  THE  MAXIMUM  OF  ABS(T(I))  BUR  I  FKUi*i  J  UP 

TU  K  IN  STEPS  OF  L. 

Am  IN1(  X  .Y  .  .  .  .  .  7.  )  BINDS  t  HE  MINIMUM  UF  ITS  ARGUMENTS  X.Y . Z  . 

gama,  theta,  and  phi  are  test  parameters  fur  zerpol  . 

DA  1  A  GAMA/0. 3/,  ThETA/1.0/,  PHl/0.2/ 

Un40=40.*2**(-33  )  UN 1 0=10. *2** (-3 3) 

OATA  UN40/U12 1 300000000/ ,  Un 10/Ul 1 73000UU000/ 
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_ 51 1 _ CjJ  1'lX.LuUl _ _ _ 

l\l  =  NOEG 
Cr.ESSH  =  C 1 1 2  (  .m  ) 

I  r  (  N  .Lt.  0  )  CALL  UnCLEI  0,  rtESSH  ) 

IF  (  .MUT.  ICJUBIG  .UR.  IM  .Lt.  79  )  GU  TU  51 
TUUBIG  =  . FALSE . 

_ : _ £a£SS  =  r.wRssH _ 

CALL  UiMCLE  (  -75,  rlESS  ) 

C  SAVE  U V tKFL UW/ UNUERF LUW  INUICaTURS  UF  T HE  CALLING  PRUGRAM  . 

51  SAVU  =  UVFLUW 
SAVU  =  Ui'irLUW 
NSA V t  =  nFPTS  f (0  ) 

_ UVF  -  .FALSE. _ 

UNr  =  .  FALSt . 

C 

C  WUVE  THE  ClJEFFIC  IEnTS  A  (  I  )  TO  UU(I-l) 

UU  ,  52  I  = .  IO,N  . 

52  0U( II  =  A( 1  +  1  ) 

£ _ 

C  SCALING  (  ONLY  WHEN  N  ,GT.  2  ) 

100  CUnTImUE 

If  I  N  .Lt .  2  )  GU  TU  204 
ASSIGN  400  TO  LSW 
C  (  SEE  S  I  AT chEnT  500  .) 

_ S,C  =  -AllAJSAl—  UU0,i-- L»_I ,  Zm±JL-i_2— 1 _ ! _ 

IF  (  SC  .tO.  0.  )  GO  1U  206 
It  (  SC  .Gt.  1101  )  GU  IU  105 
SC  =  T101/SC 

C  SCALE  BY  SC  TU  HAVE  imAXIUUI I ) , 1=0, N )  APPROACH  2 • 100  . 

GU  TU  103 

£ _ 

C  (  Kt-SCALING  NtCESSlTAftO  oY  UVERFLUW  USES  SC=2 . I -2 7 )  .) 

102  SC  =  T m2  7 
C 

103  CUViTlNUt 
Call  hp  r un 

_ uu  104  i  =  io.n _ ; _ 1 _ 

104  uu ( I )  =  usc*uu( I  ) 

CALL  rPTST 

C  FIimO  iMUNBtR  I  OF  CiHStCu  f  I VE  LEAUlNG  CUEFF IC I  EnTS  EuuAL  TU  ZtKlJ  . 

105  UU  106  I  =  10, N 

IF  (  ANU(  UU  (  I  )  ,  HIT  )  .  im  E »  0.  )  GU  TU  107 

£ _ EACH  VAnISHhij  LEAPING  COEFFICIENT  YIELUS  an  INFINITE  7FRU  . _ 

J  =  IM- 1 

106  Z(J)  =  FIimITY 

107  Ir  (  I  .to.  0  )  GU  TU  204 
C 

c  5Liur  back  coefficients  anu  ueclake  overflow  . 

_ uu  ion  k  -  [ , N _ 

j  =  k-i 

lOB  UU(J)  =  UU< K ) 
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N  =  IM-I 

IP  (  SC  .to.  Trt27  )  CALL  Ui\lCLk:(  73,  UVrUNP  ) 
UVP  =  .t'Kut. 

GO  TO  203 


■>j  =  .v  1 

N  =•  ,v  - 1 

ASSIGN  400  YU  LS'V 

(  Stc  s r a i  r  300  . ) 

U  V  r  L  0  W  =  .  I-  A  L  S  “  . 

UimpLijD  -  ,  rAl.  Sr. _ 

Ip  (  ki-2  )  20d,  206,  300 

zm  =  dsici  Cnhl x (  KfMix  -oum/uuo  )  ,  o.  )  ) 

GO  TO  2o  7 

CALL  OOkTCI  K-nO(OUO),  K.\ID(  L>IJ(  1  )  )  +U.  ,  K'J0(  L)U(  2  )  )+0.  ,  Z(2),  Z(l)  ) 

OVr  =  u Vr  ,iM  .ll'/PLO1* _ 

UimH  =  UDr  .UK  .ONFLOW 

KcSTiJKc  OVtKhLuw  Awl)  Ui'IUtKr  L0/;  INUICaYUKS  ADD  tWAriLfc  i*1  r  S  S  A  G  fc  > 

(JVpLU*i  =  SAVU 

UNpLOw  =  SAVU 

NSAVt  =  "I H H T S  T  ( I'lSAVt ) 

PHUVlDc  U 1  ■'  L  Y  T  c  ,  K  c  V  b  L  A  N  UVEK/UNUfcKh  LOW  riLSSuGbS. _ » _ 

ipiovpj  sc  =  phi  iy*phi  i  y 

Ir(U  P  )  SC  =  6l T*0 I Y 
KtTuP.J 

CncCK  ru*  Z  t-uiS  =  ( 0 . ,  0 .  )  ( rtCNCtrtMTr!  n  ,GY.  2  ) 

I  r  (  A"Q(  Dm  ! :  )  ,  HIT  )  ,lvib.  0.  )  GO  TO  LS'/~),(  400,  700  ) _ 

(tivHY  pkU'-i  HL'JCK  600  Ir  RbClHKUCAL  Ur  aicwTUiv  STtr  UVbPPLU'S  .  ) 

Ir  (  SnGL  I DO  ( iM  )  )  ,  DC,  0.  )  U"iP  =  .  TKub. 

Z  H )  =  0 » uO 
Gu  fu  20  2 

HcnC  r  r  uK  i  n  A  ,  G  T ,  2,  QUO  , 'Vb,  0,  ,  and  DU  (A1)  »*vc»  0,  _ 

Initialize:  su«*it  ustPuc  cuds j anis, 

COnT  I  ivot 
X  i\  =  i'i 

XtMl  =  XD  -  I  . 

Xn2  =  X.xl  -  1. 

X  2D  =  2,/Xd _ 

X2.mI  =  X2N/X,'il 
Xn2n  =  X  i'i  2  /  X 
N I  =  i'i-  I 
KTi'i  =  SOkT(Xi'i) 


Acft  '  C  \  ,?  i  .O  I  rc^n/i*  olo-ihl-Z  -f>rrciz>on  T>  *•>  . 

i:t>HCC  Z>,  21  )  Hcior>  A  i'-r  62  *  C  o  4s-~  I'C  a.^  Z  L  , 
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CALCULATE  G  ,  AN  UPPER  BOUND  FOR  THE  NEAREST  2ERU  . 

START  WITH  G  =  CABS (  GEOMETRIC  MEAN  OF  THE  ZEROS  )  . 

500  G  =  TWOXPI  ( A LOG 2 ( ABS ( DU ( N  )  ) )  -  AL0G2 ( ABS ( DUO  ))  I/XN  ♦  l.E-5  ) 

CALCULATE  LAGUERRE-STEP  COIR  AND  FEJER-BUUND  FOR  G  . 

CALCULATION  OF  THE  LAGUERRE  STEP  INVOLVES  THE  SQUARE  OF _ 

RECIPROCAL  OF  NEWTON'S  STEP.  SINCE  IT  CAN  EASILY  OVERFLOW,  THE 
FEJER  BOUND  IS  CALCULATED  WITH  NO  SUCH  OVERFLOWS  AND  THE 
LAGUERRE  STEP  IS  CALCULATED  FROM  IT. 

OVFLOw  =  .FALSE. 

R  =  SNGL (  DU(N-I)  I/SNGLI  OUIN)  ) 

_ IF  OVFLOW.  A  ROOT  OF  POLY.  IS  WITHIN _ N»2«*  1-12.21 _ QE _ O. - - - 

IF  (  OVFLOW  )  GO  TO  301 

CALL  QORTC (  X2N 1 *SNGL ( DU ( N-2  )  )  ,  X2N*SNGL ( DU ( N-l ) )  ,  SNGL(DU(N))  , 
$  C  ,  CF1  ) 

R  =  XN2N*R 

COIRO  =  C/CMPLX (  R*AC ( 1 )  +  XN1  ,  R*AC(2>  ) 

ABDIRO  •-  ABS  (  R  E  AL  (  CD  I  RQ  )  )  +  A  B  S  (  A  I  MAG  (  C  D I RO  )  ) _ 

G  =  AMINK  G,  1.0001*AMIN1 1 ABS (AC( 1 > )  +  ABS ( AC ( 2 )  )  ,  RTN*ABDIRO  )) 

CALCULATE  THE  CAUCHY-LOWER  BOUND  R  FUR  THE  SMALLEST  ZERO  BY 
SOLVING  ABS ( OU <  N ) )  *  SUM(  ABS ( DU ( I ) ) *R** ( N- 1  )  ,  I  «  0,  N-l  ) 

USING  NEWTON'S  METHOD  . 

R  »  G _ _ _ 

CALL  FPTUN 

601  T  *  ABS ( DUO ) 

S  *  0. 

OVFLOW  =  .FALSE. 

DO  602  I  =  2,N 

_ S  «  R»S  +  T _ 

602  T  s  R#T  +  ABS (  DU( I - 1 >  ) 

S  *  R*S  +  T 

IT  CAN  BE  PROVED  THAT  S  CANNOT  UNDERFLOW  . 

T  x  <R*T  -  ABS (  UU  <  N )  )  )/S 

S  =  R 

_ R  =  RND (  R  -  T  ) _ 

IF  (  R.LT.  S  )  GO  TO  601 
IF  (  OVFLOW  )  GO  TO  102 

R/ ( 2**1 1/N)  -  1  )  .LT.  1.445*N*R  IS  ANOTHER  UPPER  BUUNU  ,  SO  SET 
GO  =  AMINK  1 ,445*XN*R  ,  G  ) 

RU  =  0.9S)999*S _ I _ 

ASSIGN  700  TO  LSw 

(  SEE  STATEMENT  300  .  UNLESS  DEGREE  OF  PULY.  IS  REDUCED,  RO,  GO 

ANO  ABDIRO  ARE  UNCHANGED  . 

NOW  RO  .LT.  CABS (  SMALLEST  ZERO  )  .LT.  GO 
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700  COnT  I  >vut 

FtJER  =  GO  ............ 

G  =  GO 

CUIK  *  COIKU  .  .  . . 

ABO  I R  =  ABO  I RU 

OZNR  =  0.00 _ 

UZM  =  0.00 

Fn  =  ABS(OU(N)  )  . .  .. 

SPIRAL  =  .FALSt. 

STARTO  =  .FALSt.  . 

Kt-.ENTKY  POINT  TO  ACCEPT  .  MODIFY.  UR  RE JtC  I  THE  LAGuEKkE  STEP  . _ 

GAMA,  THtTA,  PH  I  ARt  ARBITRARY  PARAnET  tRS  .  ZERPUL  IS  TO  Bt  TESTtO 
FUR  SPEEO  ANU  RELIABILITY  WHEN  THEY  ARE  VAR1E0.  POSSIBLE 
VALUES  AKt  Ga,»iA=0.3,  ThETA=1.0,  PhI=0.2  . 

701  V  =  ABUIK/G 

C  ACCEPT  CUIK  Ir  CABS(CuIR)  .LE.  GAnA*G  . 

_ I r (  V  .Lt.  GAmA  )  GO  TU  800 _ 

C  REJECT  COIR  IF  CAbS(COIR)  .GT.  ThETA*G 

IF  (  V  .GT.  THtTA  )  GO  TO  1100 
C  rtOOIFY  COIK  SU  THAT  CABS! COIR )  =  GAma#G  . 

I F  (  .  IM  f J  T .  (  STARTO  .OR.  SPIRAL  )  .A  NO.  RU  .GT.  GaMA*G  )  GU  TU 
S  800 

_ V  s  GAMA/V _ 

COIR  =  CmPLXI  V*ACU I R ( 1 )  ,  V*AC0lR(2)  ) 

ABOIR  *  ABOIR*V 
C 

C  ACCEPT  PREVIUUS  ITtRATE.  SAVE  OATA  ASSOCIATEO  WITH  CURRENT  ITERATE 

800  CONTINUE 

_ S  *  l=6JfcR _ 

CL  *  COIR 
ABSCL  *  ABOIR 
FO  *  FN 
OZOR  =  IJZnR 
OZOI  =  OZNI 

Q _ COIR  AT  I  Ht  ORIGIN  IS  In  THE  PIRECTIUN  UF  OeCREASIiMG  r-  U  »v)C  T I  ON _ 

C  VALUE  SO 

STARTO  =  .TRUE. 

C  ThE  NEXT  ITtRAft  IS  Zn*CmPLX(  OZNK  *  OZnI  ),  WHERE 

c  (Entry  point  when  coir  is  not  accepteo.  ) 

hOl  OZnR  =  OZOR  +  4CLI1) 

OZnI  =  OZOI  +  ACL ( 2  ) _ 

IS  ZN  CLUSE  TU  THE  RtAL  AXIS  RELATIVE  TO  STEP  SIZE  . 

(eNTRY  POINT  FROm  THE  SPIRAL  BLOCK.) 

602  Ir  (  ABS(UZNI)  .Lt.  PHI#ABSCL  )  Gu  TO  930 
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_ 900  CUNT  I  NUE _ 

C  ZN  IS  COMPLEX  . 

C  FACTOR I Z AT  I  UN  OF  POLYNOMIAL  8Y  UUADKAT I C  FACTOR  ( Z**2-X2*Z+R I 
C 

C  SUM(Oum*Z**{N-n)  =  IZ**2-X2*Z+R)*SUM(Zm*Z**(N-I-2)  »  + 

c  Z<N-1)*(Z-X)  +  Z(N)  FOR  ALL  Z  , 

X _ LtLE _ VALiLE— Q£_I±l£ _ polynomial  at  (x.y»  is  CF  . _ 

C  FIRST  DERI  V I  T  I VE  OF  POLYNOMIAL  AT  (X,.Y)  IS  CF1  ,  AND 

C  SECOND  DERIVATIVE  UF  POLYNOMIAL  AT  (X.Y)  IS  2.*CF2  , 

C  WHERE  ( X  ♦  Y )  IS  A  ZERO  UF  Z**2-X2*Z+R  . 

C  E  IS  ERRUR  BOUND  FOR  THE  VALUE  OF  POLYNOMIAL  AND 

C  ZIII  ARE  THE  COEFFICIENTS  OF  QUOTIENT  POLYNOMIAL  . 

X _ BE  -SORE  THAT  THE  OVERFLOW  INDICATOR  IS  TuknEP  OFF. _ 

OVFLOW  =  .FALSE. 

C  CALL  FPTUN  TO  REDUCE  ERRORS  CAUSED  BY  INTERMEDIATE  UNDERFLOWS  . 
CALL  FPTUN 

C  INITIALIZATIONS  FUR  THE  EVALUATION  LOOPS  . 


901  S  *  0. 
SI  =  0. 


T 1  =  0. 

DT  «  DUO 

• 

c 

INDEX  J  IS 

used  To  Change  ox  on  The  last  iteration  . 

J  *  3 

c 

SET  Z ( X . Y  ) 
DX  =  DZNR 

TO  ZN ( ZNR , ZNI )  . 

DY  =  DZN I 

C  SC  IS  ESTIMATED  IN  CASE  SCALING  IS  NEEDED  IN  BLOCK  900  . 


SC  *  CABS (  CMPLX(UXtOY)  ) 

C  IF  CABS ( ZN )  .LE.  SORT!  SMALLEST  NU.  )  ,  SCALE  UP  X  AND  Y  . 

C  IF  CABS ( ZN )  .GE.  SORT (  LARGEST  NO.  )  ,  SCALE  DOWN  X  AND  Y  . 

_ IFI  SC  .GE.  T63B  .OR.  SC  .LE.  TM64S  )  GO  TO  90B _ 

C  SCALING  OF  X2  AND  R  IS  UNNECESSARY. 

DX2  *  DX  +  DX 
DR  *  0X**2  +  UY**2 


211)  =  DU ( I )  +  0X2*000 

ZI2)  =  DU ( 2 )  +  {  0X2*Z ( I )  -  DR*DUO  ) 

IF  (  J  .LT.  N  )  GO  TO  903 _ 


902 

DX2  = 

J  =  N 

903 

DO  904 

V  =  S1*R 

SI  *  s 

_ S  «  T1  ♦  (X2»s  -  V  ) _ 

DV  =  DTI *DR 
DTI  =  OT 

DT  =  ( UX2*0T  -  DV  )  ♦  Z( 1-2) 

904  ZIII  =  OUII)  +  (  0X2*Z(I-1>  -  DR*Z I  I -2 )  ) 
IF  (  J  .LT.  N  )  GO  TO  902 
GO  TO  909  _ 
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C  SCALE  X  AND  Y  LEST  R  OVERFLUWS  UR  UNOERELOWS  . 

905  OX  =  OX/USC 

_ _ UY  =  UY/USC _ 

C  UR  CANNUT  OVERFLOW  ,  EURTUnATEL Y  . 

OR  =  (  0X**2  +  0Y#$2 _ ) *OSC 
0X2  =  OX  +  OX 

. 7.(1)  =  UU(1).+  (OX2*OUO)#OSC 

Z ( 2 )  =  00(2)  +  (  UX2#Z(1)  -  UR*OUO  )*OSC 
_ IFt  j  .LT.  N  ) _ GO  TO  907 _ 

906  0X2  *  OX 
J  *  N 

907  00  908  I  =  J,Nl 

V  =  S1*R 

SI  s  s 

_ S  =  T1  +  (X2»s  ~  V)»USC _  ' _ 

OV  =  0T1*UR 

oti  =  or 

OV  =  0X2<“UT  -  OV 
OT  =  Z (  1-2 )  +  UV#OSC 

900  Z(I)  =  00(1)  (  0X2*Z(I-1>  -  UR*Z(I-2)  )*USC 

_ 1 F I  J  .LT .  N  )  GO  TO  906 _ . _ 

(Entry  point  eruh  the  nun-scaling  bluck  .  ) 

909  Cr  S  CNPLX (  Z(N)  ,  0ZNI*Z ( N-l )  ) 

EN  ■  CASS(CE) 

C  IE  OVELUW,  THE  CUbEEICIENTS  HOST  BE  SCALEO  OOWN. 

_ IE(OVELUR)  GU  TU  102 _ 

E  ■  ABS(OUO) 

OU  910  I  *  1 » N 

910  E  »  ABS ( Z ( I ) )  +  SCmE 
E  ■  UN40#E 

IE(UvELUw)  E  *  Xn#E 

C  CHECK  TU  SEE  IE  ZN  IS  A  ZERO. _ 

Ir(  En  ,LE.  E  )  GU  TO  1001 

C  IE  EN  HAS  UNOERELUwEO,  GIVE  THE  MESSAGE  OVEUNE  . 

Ir (  AnO (  BIT  ,  EN  )  .nE.  0.  )  GO  TO  911 
CALL  UNr '  E  (  .73,  OVEONE  ) 

GU  TU  1000 

911  CALL  EPTST _ 

C  HAS  THE  EUNCTION  VALUE  OECREASEO  . 

I E (  EN  ,GE.  EO  .ANO.  STAKTO  )  GU  TU  1100 
C 

OV  *  2.00*0ZNI 

CE1  =  CmPLX(  RnO(  Z(N-l)  -  (UV#(UT1*UZnI ) ) )  ,  RnO(  V#T  )  ) 

_ CE2  =  CHPLXt  T  -  V»(V»S)  ,  SnGL ( QZN I )  * ( 3, »T 1  -  V*(V»S1))  ) _ 

C  EluO  THE  LAGUERRE  STEP  AT  ZN, 

UVELUN  =  .EALSE. 

C  *  CE1/CE 

C  IE  UVELUW,  THERE  IS  A  ZcRO  WITHIN  A  OISTAnCE  OF  N*2*« ( -127 )  UE  ZN 

IE(UVELUW)  GO  TU  1000 


I 
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C  CUMPUTE  Trib  LAGUERRE  STEP ClLLE ADH_LHE  BOUND  FF.IFrt _ AJL _ZN.-.. _ 

CALL  COUKTC  (  CMPLXI  X2N1<<ACF2<  1>  ,  X2N1*ACF2(2>  )  , 

S  CMPLX (  X2N*ACF  1(1)  ,  X2N*ACF1(2>  )  ♦  CF,  CDIR,  CF1  ) 

FEJEK  =  ABS ( ACUIK ( 1 )  )  +  A8S  (  ACL)  I R  (  2  )  ) 

C  =  CMPLX  (  XN2n*AC  ( 1 )  ,  XN2N*AC(2)  ) 

C  =  C*CUIK 

_ C  =  CMPLX  (  ACil)  A  Mi  ■  AC  (  2  )  > _ 

CUIR  =  CUIR/C 

ABOIR  =  AtJS  <  ACOIK  (  1  )  )  +  AB S ( ACD I R ( 2 ) ) 

FEJER  =  AMIN1(  RTn*A6DIK  ,  FEJER  ) 

C  IS  THE  STEP  SIZE  NEGLIGIBLE  .  (THIS  TEST  HAY  BE  REDUNDANT  ) 

DX  *  DABS(UZNK)  +  L)ABS(UZiMl) 

_ I£1  OX  ±  ABDIK  .Ed.  13 X  ) GU  TO  1QU2 _ 

C  NOW  DETERMINE  WHETHER  COIR  IS  ACCEPTABLE  . 

GU  TO  701 
C 

950  CUinTInuE 

C  FACTORIZATION  OF  POLYNOMIAL  BY  LINEAR  FACTOR  IZ-X)  AS  FOLLOWS 

x _ 

C  SUM( DU(  I  )  *Z#« I N- I  )  )  =  (Z-X)«SUM(Z(  I  )  *Z  ##  ( N- 1  -  L  )  )  +Z(N) 

C  FOR. ALL  Z  » 

C 

C  SO  Z ( N )  IS  VALUE  OF  POLYNOMIAL  AT  Z  =  X  , 

C  FIRST  DERIVATIVE  OF  POLYNOMIAL  AT  Z=X  IS  V  ,  AND 

£ _ SECOND  DERIVATIVE  UF  POLYNOMIAL  .AT _ Z*X  IS _ _ . _ 

C  E  IS  ERROR  BOUND  FUR  THE  VALUE  UF  POLYNOMIAL  AND 

C  Z(I)  ARE  THE  COEFFICIENTS  OF  QUOTIENT  POLYNOMIAL  . 

OVF  LOW  =  .FALSE. 

C  BE  SURE  THAT  THE  OVERFLUW  INDICATOR  IS  TURNED  OFF. 

DX  *  DZNR 

_ DZNI  =  0  . DO _ ; _ 

ABX  *  ABS(X) 

DV  *  DUO 
W  =  0. 

C  CALL  i-PTUN  TO  REDUCE  ERRURS  CAUSED  BY  INTERMEDIATE  UNDERFLOWS  . 

CALL  FPTUN 

_ Z(  1  )  =  DU  (  1  )  +  OX*UUO _ _ _ 

E  *  ABS(Zll))  +  ABX#ABS ( DUO  ) 

DU  951  I  =  2,N 
W  *  V  +  X*W 
DV  =  Z ( 1-1  )  +  DX*UV 

951  Z ( I)  =  DU( I )  +  UX*Z( i-1 ) 

_ _ FN  =  ABS  (  Z  (  N  )  ) _ 

F  =  SNGL ( Z ( N ) ) 

IF(OVFLOw  )  GO  TO  102 
E  *  ABS (DUO) 

DO  952  I  *  1 *N 

952  E  =  ABS<  Z( I ) )  +  ASX*E 

_ E  =  UNIQUE _ 

IF(OVFLOw)  E  =  XN*E 


C  >  C  (  &  ;■  C  ~  ^  i  ?  L  )  ~t  £  &  *  i  C  ~  o  Ji'r 

iij  ■»  ■%!  i  L  jauvi  tou\jtl e*  •>  <fi  J  v-  g  4- -  . 
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C  CHECK  WHETHER  AN  ACCEPTABLE  ZERO  HAS  BEEN  FOUND  . 

IF<  Fn  .LE.  £  )  GO  TO  1051 

C  IF  FN  HAS  UNDERFLOWED,  GIVE  THE  MESSAGE  UVFUNF  . 

IF (  AND(  BIT  ,  FN  )  .NE.  0.  )  GO  TO  953 
CALL  UNCLE ( 73 ,  OVFUNF  ) 

_ GQ  TU  1Q5U _ 

953  CALL  FPTST 

C  HAS  THE  FUNCTION  VALUE  DECREASED  . 

I F (  FN  .GE.  FO  .AND.  STARTD  )  GO  TO  1100 
C  .  . 

OVFLOW  =  .FALSE. 

C  FIND  THE  LAGUERRE  STEP  AT  OZNK  . _ 

R  «  V/F 

C  IF  OVFLOW,  A  ROOT  OF  PULY.  IS  WITHIN  4#N#(  SMALLEST  NO.  )  OF  ZN. 

IF (OVFLOW )  GO  TO  1050 

CALL  ODRTC (  X2N1*W  ,  X2N*V  ,  F  ,  C  ,  CF1  ) 

C  CALCULATE  THE  FEJEk  BOUND  FUR  SMALLEST  ZERO  . 

FEJER  =  ABS(ACd))  +  ABS ( AC ( 2  )  ) _ 

R  *  XN2N#R 

CDIR  =  C/CMPLXJ  R*AC<1)  +  XN1  ,  R*AC(2)  ) 

ABDIR  x  ABS ( ACDIK ( 1  )  )  +  A8S( ACDIRI 2 ) ) 

FEJER  *  AMINK  RTN*ABDIR  ,  FEJER  ) 

C  IS  THE  STEP  SIZE  NEGLIGIBLE  . 

PX  «  DABS ( DZNR  ) _ 

I F (  DX  ♦  ABDIR  .EO.  DX  )  GO  TO  1052 
C  NOW  DETERMINE  WHETHER  CDIR  IS  ACCEPTABLE  . 

GO  TO  701 
C 

C  ACCEPT  CZN  AS  A  COMPLEX  ZERO  . 

1000  CONTINUE _ 

C  SET  UNDERFLOW  INOICAIUR  TO  .TRUE.  WHEN  FN  UNDERFLOWS 

UNF  =  .TRUE. 

PUT  COEFFICIENTS  UF  QUUTIEnT  POLYNOMIAL  IN  DU  ARRAY  . 

ENTRY  POINT  WHEN  FN  .  HAS  NOT  UNDERFLOWED  . 

1001  CALL  FPTST 

C  ENTRY  POINT  WHEN  STEP  SIZE  IS  NEGLIGIBLE  . _ 

1002  DO  1003  1  =  3,N 

1003  DU( 1-2)  =  Z( 1-2) 

C  DUO  IS  UNCHANGED  FOR  THc  DEFLATED  POLYNOMIAL. 

Z(N)  =  OS  I C (  CMPL  X (  RND(DZNR)  ,  RND(UZNI)  )  ) 

Z(N-l)  =  DS I C  (  CON JG (  Z(N)  )  ) 

GO  TU  201 _ ; _ 

C 

C  ACCEPT  ZN  AS  A  REAL  ZERO  . 

1050  CUNTlNUt 

C  SET  UNDERFLOW  InDICAIOR  TO  .TRUE.  WHEN  FN  UNDERFLOWS 

UNF  =  .TRUE. 

£ _ PUT  COEFFICIENTS  UF  QUUTIEnT  PQLYNUmIAL  In'  DU  A RR AY  . _ 


non  n  o  In  r>  I  o  on 


1051  CALL  FPTST 

C  ENTRY  POINT  WHEN  STEP  SIZE  IS  NEGLIGIBLE  . 

1052  OU  1053  I  =  2,N 

1053  DU( 1-1 )  =  2( 1-1  ) 

C  OUO  IS  UNCHANGED  FUR  THE  DEFLATED  "ULYNUKIAL. 

_ Z1N)  =  RNDtDZNRJ _ 

GO  TO  202 

CURRENT  LAGUERRE  STEP  IS  NOT  ACCEPTABLE  . 

1100  CONTINUE 

IF  STARTU,  REDUCE  PREVIUUS  LAGUERRE  STEP  BY  HALF. 

I F (  .NOT.  STAR TD  >  GU  1  U  1200 _ 

ABSCL  =  0.5*ABSCL 

CL  =  CMPLX (  0.5*ACLI1)  ,  0.5*ACL(2)  ) 

HAS  THE  STEP  BECOME  NEGLIGIBLE  . 

DX  =  DABS ( DZNR )  +  DABS(DZNI) 

IF  (  OX  +  ABSCL  .NE.  DX  )  GO  TO  801 


OTHERWISE.  ZERPUL  HAS  riUNG-UP. 


1103 

IF (  FN  .LT.  E*XN**2  )  GU  TO  1103 

CMESH  =  C 1 1 2 ( N  ) 

CALL  UNCLE (  75,  MESH  ) 

IF(OZNI)  1U02 .  1052,  1002 

1200  CONTINUE 

IF  .NUT.  STARTD,  HAS  CZN  BEEN  UN  THE  INNER  CAUCHY  RADIUS. 


IF(SPIKAL)  GO  TO  1201 

SET  SPIRAL  TO  .TRUE..  PUT  ZN  UN  THE  INNER  CIRCLE  UF  THE 
ANNULUS  CONTAINING  THE  SMALLEST  ZERO  IN  THE  DIRECTION  UF  THE 
LAGUERRt  STEP  . 

SPIRAL  =  .TRUE. _ 1 

CSPIR  =  CMPLXI  -1.25/XN  ,  1.  ) 

ABSCL  =  RU/XN#*2 

C  *  CMPLXl  (ACDIR(1)/ABDIR) *RU  ,  ( ACU I R ( 2 ) / ABD I R ) *R0  ) 

GO  TO  1202 
C 

C _ SET  ZN  TO  ANOTHER  PUInT  UN  THE  SPIRAL  . _ 

1201  C  =  CSPIK*CMPLX(  DZNR  .  DZNI  ) 

1202  DZNR  =  AC(I) 

DZNI  =  AC ( 2 ) 

GO  TO  802 
END 
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7094-11  SYSTEM  SUPPORT  FOR  NUMERICAL  ANALYSIS* 

W.  Kahan 

Department  of  Computer  Science 
University  of  Toronto 

ABSTRACT.  This  is  the  first  half  of  a  progress  report  on  the  author's 
efforts  to  improve  the  performance  of  IBSYS  in  the  following  areas  of 
FORTRAN  IV  programming: 

1.  Error-traces  and  diagnostic  messages  to  locate  and  explain 
flaws  found  while  executing  FORTRAN  programs. 

2.  Post-mortem  facilities  via  the  FORTRAN  IV  statement 

IF  (KICKED(OFF)).  .  . 

3.  A  consistent,  sane  and  flexible  treatment  of  over/underflow  and 
related  phenomena. 

4.  Digit  manipulation  (like  rounding)  via  FORTRAN  built-in  functions. 

5.  The  eradication  of  anomalies  in  the  compiler  (IBFTC)  and 
the  FORTRAN  library  (IBLIB). 

6.  The  expansion  of  the  FORTRAN  library  to  include  reliable  and 
convenient  subprograms  for  the  solution  of  standard  numerical 
problems  like  systems  of  linear  equations, 

polynomial  equations, 
eigenproblems , 
minimax  approximation, 
fitting  data  by  least  squares, 
v  s'ystems  of  ordinary  differential  equations, 
etc. 

Items  1  to  5  are  herein  regarded  as  essential  prerequisites  to  the 
accomplishment  of  item  6  in  such  a  way  that  users  of  these  subprograms 
need  not  supplement  their  own  competency  in  mathematics,  science, 
engineering  or  the  humanities  by  a  hyperfine  proficiency  at  both  numerical 
analysis  and  the  debugging  of  systems  programs.  Each  of  the  six  areas  will 


’■‘This  article  previously  appeared  in  SHARE  SSD  No.  159.  We  wish  to 
thank  the  editors  of  SHARE  for  permission  to  publish  it  in  these  Proceedings. 


175 


be  discussed  in  a  correspondingly  numbered  section  of  this  report,  which 
begins  by  introducing  the  motivations  for  and  the  constraints  upon  the 
author's  efforts.  Sections  1  to  3  follow;  section  4  to  6  will  be  issued 
separately  later. 

INT RODUC T ION.  For  as  long  as  electronic  computers  have  been 
in  use  (since  1949  at  the  University  of  Toronto),  there  has  existed  a  stead¬ 
fast  policy  to  widen  the  range  of  intellectual  disciplines  that  might  benefit 
from  the  machine.  That  policy  is  partly  responsible  for  a  decline  in  the 
numberical  sophistication  of  users  which  has  yet  to  be  compensated  by  an 
increased  sophistication  in  the  programs  they  can  use.  Despite  intensive 
attempts  to  educate  them  in  the  arts  of  computation,  too  many  new  users 
attribute  to  the  numerical  library  subprograms  the  infallibility  of  a  mathe¬ 
matical  proof.  They  shall  be  disillusioned.  To  what  extent  can  their 
disillusionment  be  written  off  as  part  of  their  education?  To  what  extent 
can  their  dissatisfaction  be  traced  to  shoddy  computing  systems?  There  is 
room  for  improvement  in  both  the  quality  of  education  and  the  quality  of 
computer  performance.  But  you  cannot  teach  an  old  dog  new  tricks,  and 
you  cannot  teach  a  new  dog  very  much.  Therefore  the  bulk  of  the  improve¬ 
ment  must  and  can  come  in  the  performance  of  computer  systems. 

The  performance  of  IBM's  IBSYS  on  the  7094-11  has  left  a  lot  of  room 
for  improvement.  The  improvements  listed  here  were  motivated  almost 
entirely  by  the  inadequacies  uncovered  during  the  author's  researches  into 
numerical  methods.  The  object  of  the  researches  was  to  produce  working 
programs  about  which  might  be  proved  something  simple  and  useful  to  a 
numerically  unsophisticated  but  otherwise  intelligent  and  educated  user. 

As  a  by-product  of  these  researches,  the  following  vague  generalities  have 
emerged: 

-Computation  costs  most  when  its  result  is  not  known  to  be  right 
nor  wrong,  because  it  costs  so  much  to  find  out  what  is  wrong 
and  why.  Costs  can  be  cut  by  a  small  amount  of  self-doubt  applied 
early. 

•  -Whether  or  not  the  purpose  of  computing  be  "insight",  its  most 
dependable  benefit  is  hindsight.  Programmers  dislike  forgoing 
this  benefit  through  lack  of  foresight. 

-Errors,  anomalies  and  arbitrary  restrictions  hurt  most  when 
they  are  too  rare  to  remember  but  not  rare  enough  to  ignore. 

These  generalities  have  influenced  the  many  decisions  on  questions  of 
detail  which  arose  during  the  work  on  the  system.  A  more  decisive  influence 
was  exerted  by  three  constraints: 
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First,  it  was  deemed  essential  that  programs  be  capable  of  conversion 
to  whatever  machine  might  replace  the  7094-11,  and  so  it  was  decided  that 
all  numerical  subprograms  be  written  in  a  language  like  FORTRAN  or 
ALGOL,  except  where  efficient  coding  was  so  obviously  machine  dependent 
that  the  assembly  language  MAP  was  used.  I  chose  FORTRAN  IV  in 
preference  to  ALGOL.  I  would  rather  fight  than  switch.  I  am  still  fighting 
with  the  latest  version  (13)  of  the  IBFTC  compiler  to  incorporate  all  the 
modifications  which  I  had  introduced  into  the  previous  version,  and  further 
modifications  to  correct  newly  discovered  deficiencies. 


r 


Second,  since  no  one  had  anticipated  a  need  to  rewrite  IBSYS  or  IBFTC 
in  its  entirety,  no  recources  were  allocated  for  such  a  task.  Therefore, 
IBSYS  and  IBFTC  have  been  modified  as  little  as  possible,  instead  of  being 
replaced.  The  modifications  have  cost  about  three  man-years  of  work  all 
told,  much  of  which  has  been  dissipated  in  the  transfer  of  the  modifications 
from  version  12  to  version  13  of  IBSYS. 


Third,  but  most  important,  is  our  decision  that  the  Toronto  version  of 
IBSYS  remain  compatible  with  the  standard  IBM  IBSYS.  Consequently, 
any  FORTRAN  IV  program,  even  if  it  be  in  the  form  of  a  binary  object- 
program  deck,  which  has  been  designed  for  and  runs  correctly  on  a  7094 
under  standard  IBM  IBSYS  with  a  hundred  or  so  storage  locations  to  spare 
runs  at  least  as  well  under  our  modified  system.  If  the  program  be  recom¬ 
piled  with  no  other  modification  then  the  user  may  benefit  from  our  improved 
diagnostics,  especially  where  division  by  zero  is  concerned.  Most  of  the 
users  of  our  7094-11  are  unaware  of  any  departure  from  standard.  But 
programs  which  run  well  on  our  system  sometimes  fail  mysteriously  at 
other  7094  installations. 


In  this  report  an  attempt  will  be  made  to  discriminate  between  IBM's 
standard  IBSYS  and  our  modified  IBSYS  by  referring  to  theirs  in  the  past 
tense  whenever  it  differs  from  ours.  Further  details  about  IBM's  IBSYS 

4 

can  be  obtained  from  their  manuals; 


C28-6248 

C28-6389 

C28-6390 


(IBSYS  monitor) 

(IBJOB;  loader  and  library) 
(IBFTC  FORTRAN  compiler) 


Further  details  about  our  modified  system  can  be  found  in  "The  Program¬ 
mers'  Reference  Manual"  2nd  ed.  obtainable  from 

The  Secretary,  Institute  of  Computer  Science, 

University  of  Toronto, 

Toronto  5,  Ontario, 

Canada. 
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and  henceforth  referred  to  as  the  PRM.  Program  listings  are  obtainable 
too  if  requested  by  name. 

1.  ERROR-TRACES  AND  DIAGNOSTIC  MESSAGES.  It  may  seem 
peculiar  that  a  Numerical  Analyst  be  preoccupied  with  the  System  Pro¬ 
grammer's  traditional  responsibility  for  error-traces,  diagnostics  and 
post-mortem  information.  But  let  us  watch  the  Numerical  Analyst  at  work. 
Much  of  his  computer  time  is  dissipated  by  the  diagnostics  and  post-mortems 
which  he  receives  while  trying  to  discover  why  his  algorithms  do  not  work 
as  well  as  he  had  hoped.  From  time  to  time  he  hands  one  of  his  subpro¬ 
grams  on  to  some  other  user  numerically  less  sophisticated  than  himself, 
and  in  so  doing  he  tacitly  shares  with  the  Systems  Programmers  some 
responsibility  for  issuing  diagnostics.  His  program  may  produce  diagnostic 
messages  for  different  reasons  than  merely  to  signal  its  own  collapse. 
Diagnostics  may  be  the  only  "correct"  answers  that  the  program  can  deliver 
in  response  to  problems  outside  the  intended  domain  of  its  applicability, 
especially  when  the  program's  domain  cannot  easily  be  defined  other  than 
by  attempting  to  execute  the  program.  For  example,  a  hopelessly  ill 
conditioned  linear  system 


A  x  =  l) 

is  most  easily  identified  when  a  sound  linear-equation- solver  fails  to  solve 
the  system  for  x  but  exhibits  instead  a  near  linear  dependence  d  in  the 
left  hand  side  A;  i.e. 


II A!  II  /(II  A  II  II  i  ID?0  • 


The  Numerical  Analyst's  subprogram  ought  to  pass  on  this  kind  of  diagnostic 
information  in  a  form  easily  interpreted  either  by  the  user's  calling  pro¬ 
gram  or  by  the  user  personally. 

The  later  form  of  diagnostic  is  usually  a  message  printed  amidst  the 
user's  output  and  is  often  the  consequence  of  an  error  or  oversight.  The 
crucial  question  is 


"Where  was  this  error  committed?" 

but  no  computer  program  can  answer  this  question.  The  best  that  can  be 
done  automatically  is  to  answer  the  question 

"Where  did  the  program  first  encounter  some 
anomalous  consequence  of  the  error?" 
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The  answer  takes  the  form  of  an  Error-Trace.  Under  IBM's  IBJOB 
this  would  be  provided  by  library  subprogram  .  FXEM.  ,  the  FORTRAN 
execution  Error  Monitor.  Let  us  examine  an  error-trace  typical  of  those 
produced  by  IBM's  .  FXEM.  .  For  example,  suppose  line  2  of  the  user's 
main  program  MAIN  called  a  subprogram  SUB1  in  whose  line  25  was  a 
call  to  SUB2  in  whose  line  17  was  a  reference  to  SQRT{-4.0).  When  this 
reference  was  executed,  the  SQRT  program  would  detect  the  inappropri¬ 
ately  negative  argument  and  call  .FXEM.  (say  in  line  31)  to  produce  an 
error-trace  and  diagnostic  message.  IBM's  error-trace  would  look  like 
this; 


ERROR  TRACE  CALLS  IN  REVERSE  ORDER 


CALLING 

IFN  QR 

ABSOLUTE 

ROUTINE 

LINE  NO 

LOCATION 

SQRT 

31 

17621 

SUB2 

17 

14513 

SUB  1 

25 

07762 

MAIN 

2 

05413 

The  names  in  the  first  column  are  the  deck-names  assigned  by  the  user 
to  his  subprograms  (or  else,  in  our  modified  system,  assigned  by  default 
by  the  system).  The  line  numbers  or  "Internal  Formula  Numbers"  in  the 
second  column  refer  to  numbers  printed  in  the  programs'  source  listings, 
and  can  be  exploited  by  the  FORTRAN  IV  programmer  without  recourse 
to  storage  maps.  For  this  reason,  the  third  column  of  absolute  octal  core 
locatibns  is  of  secondary  value  to  the  FORTRAN  programmer.  It  is  a 
great  convenience  that  he  can  ignore  this  column  and  dispense  with  storage 
maps  most  of  the  time. 

i  l  * 

The  completeness  of  the  error-trace  shown  above  is  one  of  its  most 
valuable  features.  Complicated  programs  can  contain  several  references 
to  the  SQRT  subroutine,  and  it  is  vital  that  the  path  of  control  to  the  invalid 
reference  be  laid  out  explicitly.  The  complete  error-trace  is  even  more 
valuable  when  languages  which  permit  recursive  procedures  are  used. 

If  a  user  were  instead  provided  with  only  the  reference  to  SQRT  (or  only  to 
SQRT  and  SUB2)  in  the  error-trace  above,  he  might  waste  a  lot  of  time 
checking  through  all  of  his  calls  to  SUB2  in  an  attempt  to  uncover  the 
faulty  one. 

IBM's  .FXEM.  would  print  out  a  two-line  diagnostic  message  and 
provide  a  means  to  exercise  options  regarding  kick-off  or  continued 
execution  following  the  diagnostic  and  error-trace.  But  .FXEM.  suffered 
from  two  defects. 
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One,  the  easiest  to  remedy,  was  that  .FXEM.  could  be  called  only 
from  a  MAP  assembly  language  program.  We  fixed  this  by  providing 
a  program  called  UNCLE;  any  programmer  can  kick  himself  off  (and 
produce  an  error-trace  plus  post-mortem  debugging  output)  by  executing 

CALL  UNCLE  . 

He  can  offer  users  of  his  program  a  limited  range  of  kick-off-or-continue 
options  by  writing 


CALL  UNCLE  (N) 

with  a  suitably  chosen  integer  expression  N.  He  can  supply  one  or  two 
diagnostic  messages  too  by  writing 

CALL  UNCLE  (N,  Message)  or 

CALL  UNCLE  (N,  Message  1,  Message  2)  . 

The  messages  can  be  inserted  literally  as  Hollerith  strings  or  they  can  be 
referenced  as  arrays  of  alphanumeric  data.  In  the  latter  case,  rudimentary 
binary-to-BCD  conversion  facilities  are  available  to  permit  integer  valued 
variables  like  indices  or  error-codes  to  be  inserted  into  the  diagnostic 
without  first  reserving  core  storage  for  the  panoply  of  FORTRAN  input/ 
output  subprograms.  This  last  is  an  important  consideration  when  program 
overlay  is  required  during  execution.  (For  more  details  about  UNCLE, 
consult  the  PRM.  ) 

.  FXEM's  second  defect  was  that  it  could  cope  only  with  what  I  call 
"scheduled  errors";  these  are  errors  each  of  which  is  discovered  in  a 
subprogram  which,  when  it  calls  .FXEM.  to  produce  an  error-trace, 
can  supply  whatever  linking  information  is  needed  by  .FXEM.  to  start 
the  error-trace.  For  example  SQRT(-4.0)  is  a  scheduled  error  because 
SQRT  is  called  in  a  standard  way.  But  when  unscheduled  errors  like  over/ 
underflow,  division  by  zero,  running  overtime,  ...  ,  were  detected  they 
would  "trap",  i.e.  cause  interrupts  which  transferred  control  to  appropriate 
subprograms  without  carrying  the  standard  linking  information  that  made  an 
error-trace  possible.  Consequently,  the  diagnostics  for  unscheduled 
errors  answered  the  question  "where?"  with  an  absolute  octal  core  loca¬ 
tion,  but  could  not  answer  the  question 

"How  did  I  get  there?" 

That  IBSYS's  standard  linking  sequence  contained  a  partial  answer  to 
the  last  question  was  widely  recognized.  The  first  effort  to  extract  a  full 
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answer  was  made  by  G.  Wiederhold  and  G.  D.  Johnson  at  Berkeley  (Univ. 
of  California)  in  1963.  Their  work  has  appeared  in  SHARE  SSD  121  of 
May  2l/64  and  SDA's  3066-7.  A  similar  scheme  was  devised  by  J.  Leppik, 
G.  Howard  and  the  author  at  Toronto  in  1964.  Our  scheme  differs  from 
theirs  mainly  in  that  ours  is  simpler  to  use,  slightly  less  flexible,  and 
fully  compatible  with  the  standard  IBM  system. 

The  first  step  in  both  schemes  is  to  revise  the  standard  SAVE  pseudo¬ 
operation  by  which  subprograms  are  expected  to  save  and  restore  index 
registers,  control  linkages,  etc.  When  IBM's  SAVE  was  executed  upon 
entry  to  a  subprogram  SUB,  it  used  to  save  in  a  cell  called  SYSLOC 
the  pointer  to  the  statement 


CALL  SUB  , 

but  no  subsequent  use  was  made  of  SYSLOC.  We  have  added  two  instruc¬ 
tions  to  SAVE  whose  effect  is  to  store  the  same  pointer,  during  the 
RETURN  from  SUB  to  the  instructions  following 

CALL  SUB  , 

in  such  a  way  that  the  contents  of  SYSLOC  show  whether  SUB  has  just 
been  entered  or  has  just  returned.  This  modification  has  no  effect  upon 
the  way  IBM's  .  FXEM.  behaves  for  scheduled  errors. 

Next,  I  rewrote  .  FXEM.  so  that  it  can  be  called  from  a  trap- 
handling  program.  Such  a  CALL  is  distinguished  from  other  standard 
CALLS  by  the  absence  of  certain  otherwise  expected  linking  information, 
the  lack  of  which  forces  .FXEM.  into  a  new  mode  of  action  which  examines 
SYSLOC  to  produce  the  first  line  of  the  error-trace. 

The  behaviour  of  the  new  .FXEM.  is  best  illustrated  by  an  example. 
Suppose  that  SUB2  in  the  example  above  contains,  besides  SQRT(-4.0), 
a  division  which,  when  executed,  turns  out  to  be  a  division  of  zero  by 
zero.  The  result  is  the  following  diagnostic  (in  which  the  contents  of  the 
second  line  depend  upon  an  option  selected  by  the  user): 
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0.  0/0.  0  ERROR  AT 
RESULTS  IN  0.0  or  EXECUTION  TERMINATED 
ERROR-TRACE  WITH  CALLS  IN  REVERSE  ORDER  CODE  25 


CALL  IS  IN  AT  IFN  OR 

DECK  NAMED  LINE  NO. 


ABSOLUTE 

LOCATION 


SUB2  17  + 

SUB1  25 

MAIN  2 


14513 

07762 

05413 


The  important  change  shows  up  in  the  +  sign  after  the  line  no.  17. 

This  means  that  the  announced  anomaly  was  detected  during  or  after  (in 
time)  the  execution  of  line  no.  17  of  SUB2,  but  before  any  subsequent 
CALL  was  executed.  Since  SUB2  has  a  call  to  SQRT  in  line  17  at 
location  14513  (cf.  the  previous  error-trace),  and  the  0.  0/0.  0  occurred 
five  words  ahead  of  this  location  in  the  program,  it  seems  likely  that  the 
program  was  executing  a  loop,  perhaps  a  DO-loop,  which  contains  the 
offending  division  just  a  line  or  two  in  the  listing  ahead  of  the  square  root; 
and  this  loop  was  executed  at  least  once  before  the  divisor  vanished. 

The  detective  work  in  the  last  sentence  is  not  typical;  usually  the 
error  can  be  located  by  the  most  superficial  inspection.  But  the  need  for 
any  detective  work  at  all  is  an  unfortunate  consequence  of  the  way  IBM's 
FORTRAN  IV  compiler  works.  Instead  of  identifying  every  line  in  the 
symbolic  listing  with  a  line  number  that  .  FXEM.  could  deduce  at 
execution  time  (for  example,  by  locating  a  dummy  instruction 

TDC  ID,  O,  LKDR 

at  the  beginning  of  the  coding  emitted  by  the  compiler  for  line  no,  ID  of 
the  FORTRAN  subprogram  whose  linkage  information  can  be  found  at 
LKDR),  the  compiler  assigns  a  useable  line  number  only  when  a  CALL  is 
generated.  Since  an  implicit  CALL  is  generated  for  all  references  to 
FUNCTION  subroutines,  as  well  as  for  most  exponentiations  of  the  form 
X**J  and  X  *  *  Y ,  for  input/output,  for  complex  multiplication  and  division, 
and  for  a  computed  GO  TO(n^,  n^,  ....  nm)>  L  there  are  few  programs 

whose  listed  line  numbers  are  too  sparse  for  a  successful  interpretation 
of  tne  error-trace.  Axd,  at  worst,  the  unscheduled  error  is  located  to 
within  one  subprogram. 

The  CODE  25  at  the  head  of  the  error-trace  tells  the  programmer  how 
to  exercise  his  option  to  define  0.  O/O.  0  in  one  of  two  ways;  either 
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0.  0/0.  0  =  0.0  and  continue  execution,  or 
0.  0/0,  0  =  EXECUTION  TERMINATED. 


For  example,  the  first  option  is  the  result  of  executing 

CALL  KIKOPT  (25,  1) 
while  the  second  results  from 

CALL  KIKOPT  (2  5,0)  . 

The  reader  is  referred  to  the  PRM  for  precise  details  about  available 
options  and  how  to  exercise  them  conveniently.  What  follows  is  a  conden¬ 
sation. 

The  PRM  contains  a  table  of  error  codes  and  messages  (cf.  Fig.  25 
and  the  section  "Subroutine  Library  Error  Messages"  in  IBM's  IBJOB 
manual,  Form  C28-6389-1)  which  describes  for  each  code  its  error 
condition,  the  options  available,  and  which  option  is  assumed  by  the  system 
in  default  of  a  request  to  the  contrary.  The  default  option  is  usually  to 
provide  a  message  and  then  continue  execution  in  some  reasonable  way. 

I  believe  that,  taken  together  with  the  other  diagnostic  facilities  in  our 
system,  our  surprisingly  simple  set  of  options  covers  almost  all  circum¬ 
stances  satisfactorily.  For  serious  errors  we  assign  positive  codes,  like 
+25  for  0.  O/O.  0,  to  signify  that  the  allowed  options  are 

+  1)  Give  a  message  and  error-trace,  and  then  continue  reasonably, 
or 

+0)  Give  a  message  and  error-trace,  and  then  terminate  execution, 
(Some  errors,  like 

GO  TO  (1, 2,  3),  4 

are  so  serious  that  option  +1  is  denied.  )  For  milder  errors  we  assign 
negative  codes,  like  -13  for  SQRT  (-4.0),  which  signify  that  the  allowed 
options  are 

-1)  Give  a  message  and  error-trace,  and  then  continue  reasonably, 
or 

-0)  Give  no  message  nor  error-trace;  just  continue  reasonably. 

The  meaning  of  "continue  reasonably"  is  discussed  later  in  this  report. 
For  now  it  suffices  to  give  a  few  exarrples; 
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Error  Condition  and  "Reasonable"  Response 


Code 


SQRT(-X)  =  -  SQRT(X)  -13 

LOG(-A)  =  LOG(ABS{A))  -10 

0. 0**0  =1.0  -  3 

0**0  =1.0  -  1 

0.  0**0.  0=1.0  +6 

0.  0/0.  0  =  0.  0  +25 


*Footnote;  We  allow  programmers  to  write  L©G(X)  or 
ALOG  (X)  interchangeably  as  they  please 
rather  than  penalize  them  for  the  venial  sin 
of  omitting  the  A. 

Programmers,  particularly  writers  of  library  subprograms,  can 
easily  provide  other  kinds  of  optional  responses  to  error  conditions 
detected  by  their  own  subprograms  because  the  status  of  the  option- 
indicator  (a  binary  digit)  associated  with  any  error-code  number  can  be 
sensed  and  stored  as  well  as  change  via  KIKOPT.  A  complicated  program 
may  have  several  error-codes  assigned  to  it,  but  this  causes  no  problems 
because  280  codes  are  available.  Programmers  are  free  to  use  error- 
codes  as  flags  or  flip-flops  in  a  way  comparable  to  the  use  of  sense -switches 
and  sense-lights  on  the  older  slower  machines. 

A  comment  is  required  to  explain  that  last  .  FXEM.  option  -0  which, 
in  effect,  allows  .FXEM.  's  activity  to  be  suppressed  entirely  when  the 
error  is  a  mild  one  with  a  negative  code.  Some  of  these  errors  are  better 
described  as  differences  of  opinion  about  the  most  apt  definition  of  a  func¬ 
tion  or  an  expression,  as  in  the  cases  of  0**0  =  1  and  0.  0**0  =1.0  (cf.  the 

00  j 

Taylor  series  E  a  x  at  x  =0.0).  In  these  cases  the  warning  messages 
o  r 

serve  only  to  remind  users  that  my  definitions  are  not  universally  accepted 
in  the  computing  world.  If  he  is  satisfied  to  do  things  my  way,  he  can  turn 
the  message  off.  If  he  prefers  another  way,  he  can  easily  change  the 
relevant  program  to  his  own  specifications  with  the  aid  of  the  documentation 
which  we  supply. 

Other  errors  with  negative  codes  sometimes  represent  minor  over¬ 
sights;  an  example  is 

LOG(-X)  =  L©G(ABS(X))  ,  Code  -  10. 


For  reasons  discussed  later,  our  policy  is  to  try  not  to  terminate  execution 
because  of  such  an  oversight.  Rather,  it  seems  better  to  continue  and  find 
out  what  else  the  programmer  overlooked.  We  do  not  encourage  program¬ 
mers  to  exploit  system  side -effects  to  save  the  bother  of  a  sign-test  or 


184 


some  such  simple  instruction.  We  do  not  regard  the  -0  option  as  one 
which  should  be  employed  in  production  or  library  programs  to  correct 
oversights,  except  possibly  temporarily,  because  this  type  of  hidden 
coding  is  so  difficult  to  remember  when  late-hatching  bugs  are  being 
sought. 

To  implement  the  new  .FXEM.  and  error-trace  required  several 
man-months  of  work,  most  of  which  was  spent  tracking  down  anomalies. 
For  example,  several  input/output  programs  supplied  as  part  of  earlier 
versions  of  FORTRAN  IV  were  found  to  use  non-standard  subprogram 
linkages,  ana  these  had  to  be  repaired  to  allow  even  the  old  .FXEM.  to 
produce  meaningful  error-traces  before  they  were  further  modified  to 
work  with  the  new  .FXEM.  .  Every  library  program  had  to  be  examined; 
here  we  reaped  an  unexpected  reward  when  we  discovered  that  the  new 
.FXEM.  makes  possible  a  shorter  and  faster  subprogram  linkage  to 
certain  library  programs  like  SQRT,  SIN,  COS,  LOG,  EXP,  complex 
multiply,  complex  divide,  A**J,  and  others. 

But  one  large  job  remains.  The  FORTRAN  compiler  must  be  modified 
to  generate  standard  CALLs  to  Arithmetic  Statement  Functions  which  at 
the  present,  as  compiled  by  IBM's  FORTRAN  IV  v.  13,  use  non-standard 
CALLs  in  order  to  save  about  7  microseconds  per  CALL.  (One  division 
costs  8 . 4  microseconds. )  Consequently  both  IBM's  .FXEM.  and  ours 
produce  error-traces  which  skip,  sometimes  confusingly,  over  references 
to  Arithmetic  Statement  Functions. 

2.  POST-MORTEM  FACILITIES.  We  prefer  to  think  of  kick-off  as 
an  act  of  desperation  on  the  part  of  a  subprogram,  and  therefore  try  not 
to  terminate  execution  unless  it  is  overwhelmingly  probable  that  continued 
execution  will  be  an  utter  waste.  There  is  little  risk  that  errors  like 
SQRT(-4.  0)  will  be  repeated  millions  of  times  to  no  good  purpose,  because 
the  monitor  imposes  the  user's  own  limit  upon  the  total  number  of  lines 
of  printed  output,  thereby  protecting  him  from  a  million  lines  of  SQRT's 
diagnostic  and  error-trace.  Furthermore,  programmers  who  are 
especially  sensitive  to  a  waste  of  their  computer  time  allotment  can  use 
statements  like 

IF  (CLOCK  (TST ART)  .GT.  TMAX)  CALL  UNCLE 
to  kick  themselves  off  when  the  elapsed  time  since 

TST  ART  =  CLOCK  (0.  0) 

exceeds  TMAX,  at  a  cost  of  70  microseconds  per  execution.  (One  square 
root  costs  64  microseconds.  ) 
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But  sometimes  kick-off  is  the  only  reasonable  response  to  an  error. 

This  response  gives  rise  to  a  class  of  programmer  who  has  only  one 
diagnostic  and  error-trace  to  show  for  his  several  seconds  (or  minutes) 
of  computer  time.  It  is  uncharitable  to  advise  him  that  he  should  have 
exercised  enough  foresight  to  provide  intermediate  output  as  insurance 
against  such  an  event.  Besides,  he  may  reply 

"I  thought  I  had  debugged  that  program!' 

We  doubt  the  wisdom  of  the  widespread  tendency  to  inundate  every 
user  who  is  kicked  off  with  a  complete  dump  of  storage  willy-nilly.  This 
could  drown  him  in  octal  data  which  he  is  unlikely  to  be  able  to  read.  It 
is  a  costly  way  to  educate  students. 

The  ideal  solution  would  be  to  display  conveniently  just  those  variables 
which  have  figured  in  the  events  leading  up  to  the  debacle.  Our  solution 
is  not  ideal,  but  it  is  simply  and  flexible.  It  is  an  improved  version  of  our 
PMORT  described  in  Comm.  A.C.M.  7  (1964)  p.  15.  We  allow  the  pro¬ 
grammer  to  write  into  his  FORTRAN  IV  program  a  statement  of  the  form 

IF  (KICKED(OFF)  )  <any  executable  statement> 

<  the  next  executable  statement  > 

with  the  expectation  that,  because  the  value  of  the  logical  function  KICKED 
is  always  .TRUE.  ,  his  program  will  merely  execute  <the  next  executable 
statement  .  But  if  and  when  his  program  is  kicked  off,  the  monitor  will 
give  him  the  diagnostic  and  error-trace  that  he  deserves  and  then,  after 
over-writing  <the  next  executable  statements*  with  CALL  EXIT,  will 
execute  <any  executable  statements*. 

e.g.  1:  IF(KICKED(OFF))  WRITE(.  .  .  ) 

causes  the  desired  information  to  be  written  out  if  and  only  after  the  program 
has  been  kicked  off.  The  programmer  can  choose  a  FORMAT  to  suit  himself 
or,  if  more  convenient,  he  can  use  the  simple  unformatted  output  provided 
by  the  NAMELIST  feature  of  FORTRAN  IV;  or  he  can  CALL  DUMP  and  be 
drowned. 

e.g.  2;  IF(KIC KED(OFF))  CALL...  or 

GO  TO  .  .  . 

causes  the  desired  transfer  of  control  to  take  place  after  kick-off,  and 
thus  permits  a  user  to  store  valuable  data  on  magnetic  tapes  and  ask  the 
operator  to  save  them.  Or  he  can  call  a  complicated  diagnostic  program  of 
his  own,  or  he  can  try  again  to  solve  his  problem  by  some  method  other 
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than  the  one  which  failed.  The  monitor  will  allow,  say,  20  seconds  and 
300  printed  lines  of  computer  activity  after  the  first  kick  off.  Of  course, 
any  second  kick-off  is  final  despite  further  IF  (KICKED(OFF)).  .  .  requests. 
Because  the  user  has  recourse  to  KICKED,  writers  of  library  and  systems 
programs  are  under  less  pressure  when  they  have  to  decide  whether  an 
anomalous  condition  should  terminate  execution  or  just  produce  a  warning. 

Programmers  are  encouraged  to  use  KICKED  as  often  as  they  like 
in  both  FORTRAN  and  MAP  assembly  language  programs,  and  they  can 
leave  these  KICKED  statements  in  production  programs  as  insurance 
against  the  remote  possibility  that  an  undiscovered  bug  may  terminate 
execution  in  a  cloud  of  mystery.  Each  executed  reference  to  KICKED 
consumes  less  than  14  microseconds  (less  than  two  division  times)  so 
KICKED  can  be  used  in  fairly  tight  loops  without  seriously  wasting  time. 

The  monitor  will  respond  at  kick-off  only  to  the  last  executed  reference 
to  KICKED. 

An  important  limitation  upon  KICKED  was  imposed  by  the  absence  of 
any  block  structure  in  FORTRAN  comparable  to  that  in  ALGOL,  and  by 
the  way  that  indexing  is  optimized  in  FORTRAN.  This  limitation  exists 
because,  whenever  kick-off  occurs  in  some  subprogram  remote  from  the 
one  containing  the  KICKED  statement  and  then  control  is  passed  to  <any 
executable  statement  after  the  IF(KICKED(OFF)) ,  no  attempt  is  made  to 
restore  index  registers  to  the  state  they  were  in  when  KICKED  was  called 
nor  to  re-set  tapes  to  their  former  positions.  More  important,  there  is  no 
way  to  reproduce  the  effect  of  those  instructions  which  may  have  been  placed 
in  "optimum"  positions  ahead  of  the  call  to  KICKED  in  order  to  initialize 
index  registers  and  addresses  as  efficiently  as  possible  from  the  point  of 
view  of  the  normal  sequence  of  control.  For  example,  if  kick-off  occurs 
during  the  computation  of  FCN  in  the  sequence 

D0  3  J  =  1,  10 

A(  1 ,  J )  =  J  -  1 

DQ  3  I  =  1,  J 

IF  (KICKED(OFF))  WRITE(.  .  .  )  I,  J,  B(l),  B(j),  (A(K,J),  K=1,J) 

3  A(I  +  1,  J)  =  FCN(B(I),  B(J),  A(I  +  1,  J))  +  A  (I,  J) 

there  is  no  way  at  kick-off  time  to  move  the  numbers  I  and  J  from 
storage  into  the  appropriate  cells  and  index  registers  for  the  references 
to  B(l),  B(J),  A(K,  J)  and  "K  =  1,  J"  following  the  call  to  KICKED. 

A  second  limitation  shows  up  when  program  overlay  takes  place;  there 
is  no  simple  way  to  detect  whether  <any  executable  statement>  in  the 
IF  (KICKED(©FF))  statement  has  been'partially  overlaid,  or  whether  it 
refers  to  data  which  has  been  overlaid.  Consequently  we  inserted  an  instruc¬ 
tion  in  . LGVRY,  the  overlay  handling  subprogram,  which  causes  the 
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Whenever  the  array  Y  is  changed,  indicate  which  element  too; 

Y  (2)  =  .74131042  E  -  18  . 

Whenever  the  third  column  of  array  Z  is  changed,  say  so; 

Z( 1 3,  3)  =  0.  0  . 

Whenever  the  subprogram  PROG  is  called,  write  out  its  arguments; 

CALL  PROG  (13,  27.421493,  Y)  WITH 

Y(  1)  =  1.4012362 
Y(2)  =  . 74131042  E  -18 
Y(3)  =  0.  0  . 

IF  PROG  is  a  function,  write  out  its  value  too; 

PROG  (13,  27.421493,  Y)  =  1.7014  E38  WITH 
Y(  1)  =  etc. 

Whenever  statement  n  is  executed,  say  so.  If  this  is  a  logical  IF 
statement,  tell  what  happened. 

The  MONITOR  facility  as  described  above  has  been  implemented 
at  least  partially  in  several  compilers;  unfortunately,  ours  is  not  one 
of  them.  The  problem  is  to  deal  with  the  statement 

IF  (KICKED(OFF))  MONITOR . 

for  which  the  nicest  solution  would  be  a  retroactive  display  of,  say,  the 
last  300  lines  of  output  which  would  have  been  produced  if  that  MONITOR 
statement  had  not  been  bypassed.  Some  compilers  already  have  a  feature 
of  this  kind;  the  author  envies  their  users. 

Now  is  a  good  time  to  compare  the  error-options  needed  by  the 
programmer  with  those  available  to  him.  He  may  want  to  assign  to  a 
specified  anomaly,  like  0.  0**0  ,  one  of  the  following  four  consequences: 

-0}  Re -interpret  the  request  in  a  way  judged  to  be  appropriate 
for  the  majority  of  users  (say  0.  0**0  =  1.0)  and  continue 
with  no  message  nor  error-trace. 

1)  Re-interpret  the  request  as  above,  and  put  out  a  message  and 
error-trace  to  tell  the  programmer  what  happened  and  where, 
and  then  continue  execution. 
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+0)  Put  out  a  message  and  error-trace  to  explain  where  and 
why  execution  was  terminated,  and  then  grant  any  post¬ 
mortem  request  that  may  have  been  made  via 

IF  (KICKED(OFF)  ...  . 

2)  Transfer  control  to  a  location  designated  in  advance  by 
the  programmer  where  he  may  cope  with  the  anomaly  as 
he  pleases,  provided  the  necessary  information  is  easily 
accessible  to  him. 

Our  system  offers  at  least  two  of  the  first  three  options  for  most 
error  conditions.  The  last  option  is  dangerous  in  FORTRAN  for  the 
reasons  cited  while  discussing  the  limitations  of  KICKED,  unless  it  is 
handled  carefully.  The  following  discussion  explains  how  some  of  our 
library  programs  offer  option  2). 

Consider  for  example  our  least  squares  library  subroutine  LSTSQ 
which,  given  a  rectangular  M  x  N  matrix  X  and  a  column  vector  ^  , 
attempts  to  find  that  coefficient  vector  c  which  minimizes  the  sum  of 
square  s 

S  =  (y  -  Xc)T  (y  -  Xc)  =  2  .  (y.  -  2  .x.  .c.)2  . 

-  -  -  -  l  wi  J  ij  J 

A  solution  c  always  exists  and  satisfies  the  normal  equations 

XTXc=XTy  . 

LSTSQ  tries  to  solve  these  equations  (in  double  precision,  because  that 
is  the  fastest  adequate  method  on  a  7094)  for  c  and  the  corresponding 
minimum  value  of  S  and,  if  requested,  the  inverse  matrix 

v  =  (xTx)_1 

But  if  the  columns  of  X  are  nearly  linearly  dependent,  in  the  sense  that 
there  exists  a  perturbation  AX  of  the  order  of  a  few  units  in  the  last 
place  of  X  such  that  the  columns  of  (X+  AX)  are  linearly  dependent,  then 
the  solution  £  is  not  well  defined  and  LSTSQ  produces  one  of  two  things 
instead  of  c* 

0)  If  the  user  wrote 

CALL  LSTSQ  (X,  M,  N,  Y,  C,  S)  or 
CALL  LSTSQ  (X,  M,  N,  Y,  C,  S,  V) 
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then  he  has  made  no  provision  for  the  possibility  that  X 
be  nearly  singular,  so  he  receives  a  suitable  diagnostic 
and  error-trace  and  is  kicked  off. 

1)  If  the  user  wrote 

CALL  LSTSQ  (X,  M,  N,  Y,  C,  S,  $n )  or 
CALL  LSTSQ  (X,  M,  N,  Y,  C,  S,  V,  $n) 

where  n  is  an  integer  standing  for  a  statement  number, 

LSTSQ  returns  control  to  statement  number  n  in  the  user's 
calling  program,  and  diagnostic  information  is  made  available 
in  V  (or  elsewhere  if  V  was  not  requested)  which  permits 
the  calling  program  to  identify  the  linear  dependence  rela¬ 
tively  easily  and  change  X  appropriately.  (Usually  the  calling 
program  just  decreases  N.  )  LSTSQ  does  not  put  out  any 
messages  in  this  case. 

The  foregoing  description  is  somewhat  simplified;  details  can  be 
found  in  the  PRM.  The  interesting  feature  is  not  so  much  the  use  of  a 
FORTRAN  IV  error  return  $n  as  the  fact  that  this  error  return  is  optional. 
The  option  is  available  because  one  of  the  first  statements  executed  within 
LSTSQ  is 


CALL  ARGCNT  (I,  J) 

which  counts  the  arguments  supplied  in  the  CALL  to  LSTSQ.  I  is  the 
number  of  arguments  exclusive  of  error  returns,  and  J  is  the  number  of 
error  returns.  The  error  options  described  above  are  numbered  0  and  1 
according  to  the  value  of  J.  Similarly,  LSTSQ  determines  whether  the 
user  wants  V  =  (XTX)“*  or  not  according  as  I  =  7  or  6  respectively.  Any 
other  values  of  I  or  J  indicate  an  error,  like  a  period  between  the  integers 
M  and  N  instead  of  a  comma,  which  is  serious  enough  to  terminate 
execution  with  an  appropriate  diagnostic. 

The  use  of  variable  length  argument  lists  lends  a  certain  elegant 
simplicity  to  several  of  our  library  programs,  and  we  hope  that  this 
feature  will  be  incorporated  in  the  programming  languages  of  the  future. 
The  simplicity  with  whicn  the  error  return  scheme  can  be  implemented 
makes  it  efficient  and  satisfactory  for  a  wide  range  of  applications,  but 
there  are  two  important  areas  where  the  scheme  is  unsatisfactory.  One 
consists  of  those  difficulties  caused  by  a  small  lack  of  foresight  and 
recognip.ed  immediately  with  the  slight  assistance  to  hindsight  provided 
by  a  diagnostic.  Many  of  the  error  conditions  mentioned  above,  like 
LOO(X)  when  LOG(ABS(X))  was  intended,  fall  into  this  category.  So  do 
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many  input/output  problems.  It  suffices  here  to  say  that  a  lot  more 
could  be  said  for  the  desirability  and  convenience  of  subprograms  like 
KIKOPT  which  allow  the  programmer  to  revise  temporarily  the  execu¬ 
tion  of  his  program  at  each  of  several  spots  without  having  to  insert  a 
small  explicit  change  at  each  spot. 

The  second  area  where  error  returns  have  proved  unsatisfactory 
covers  Over/Underflow,  a  ubiquitous  phenomenon  to  which  the  next 
section  of  this  report  is  devoted. 

3.  OVER/UNDERF LOW .  Overflow  and  Underflow  are  what  take 

place  in  the  arithmetic  registers  of  a  computer  whenever  an  attempt  is 
made  to  calculate  numbers  outside  the  normal  range.  On  the  7094,  over¬ 
flow  occurs  whenever  the  magnitude  of  the  result  of  a  floating  point 
arithmetic  operation  equals  or  exceeds 

1 27  38 

2  =  1. 70141183  x  10  ; 

underflow  occurs  whenever  the  magnitude  is  not  exactly  zero  and  is 
smaller  than 

2129  =  . 146936794  x  10"38  . 

Special  provision  must  be  made  to  cope  with  over/underflow  in  a  way 
which  does  not  produce  misleading  results. 

It  is  sometimes  argued  that  overflow  is  an  error  for  which  the  penalty 
should  be 


EXECUTION  TERMINATED 

but  this  penalty  would  place  an  intolerable  burden  upon  even  the  most 
expert  numerical  analyst.  He  is  often  unable  to  predict  in  advance  what 
the  range  of  numbers  will  be  in  complicated  calculations,  especially 
where  exponentials,  polynomials  and  rational  functions  of  high  degree, 
or  spaces  of  high  dimensionality  are  concerned.  For  example,  if  P(x,  y) 
is  a  polynomial  in  x  of  degree  10  whose  coefficients  are  wild  functions 
of  y,  then  the  desired  solution  x  =  X(y)  of  the  equation  P(x,y)  =  0  may 
be  well-defined  and  reasonable  even  though  it  is  inaccessible  unless  the 
polynomial- zero -finding  subprogram  is  allowed  to  pursue  a  flexible 
scaling  strategy  in  response  to  over/underflows,  if  any,  which  occur 
during  the  computation  of  P(x,y).  Overflows  should  not  force  kick-off; 
if  worst  comes  to  worst,  a  program  can  kick  itself  off  by  executing,  say, 


IF(OVFLOW)  CALL  UNCLE(0, 22H  INESCAPABLE  OVERFLOW.  ). 


An  opposite  attitude  of  laissez-faire  is  reflected  in  the  designs  of 
tnose  machines  whose  hardware  automatically  replace  an  overflowed 
magnitude  by  a  special  digit  pattern  representing  *>  and  then  plunge  on. 
Such  a  scheme  might  well  include,  say,  0  to  replace  an  underflowed 
magnituce  and  ft  to  indicate  an  indeterminate  value.  These  symbols 
might  obey  rules  like  the  following: 

i)  Whenever  an  arithmetic  operation  generates  +  „ ,  9  or  -ft  ,  a 

corresponding  flag  is  raised  to  indicate  to  the  program  that  overflow, 
underflow  or  lost  significance  respectively  has  occurred.  If  requested 
by  the  programmer  in  advance,  a  message  can  be  printed  out  for  his 
information. 

11)  Any  arithmetic  operation  with  ft  as  an  operand  generates  ^  as  a 
result,  -ft  is  also  generated  by  the  following  expressions:  »  -  <» , 

00 /*>,  0/0,  0/0 ,  e/o,  e/e,  «  *  o,  M  *  e  and  x/e  . 

iii)  If  x  >  (l  unit  in  the  last  place  of  the  overflow  threshold) 

then  00  -  x  =  ft  ;  otherwise  »  +  x  =  » 

If  (1  unit  in  the  last  place  of  x  )  <  (the  underflow  threshold) 

then  »  -  x  =  -ft  ;  otherwise  x+0  =  x  +  0  =  x. 

If  x  >  1  then  x  *  »  =  »  *  sign(x)  ;  otherwise  x*»=ft  . 

Similar  rules  hold  for  x/oo,  ®/x,  x^'ft  and  0 /x  . 
x/0  =  00*  sign(x)  unless  x  =  0  or  0  . 

iv)  The  number  0  can  be  generated  only  by  direct  assignment  or  as 

the  result  of  x-x  with  x/e  nor  »  .  The  symbol  0  ,  which  stands 
for  the  set  of  all  numbers  smaller  in  magnitude  than  the  underflow 
threshold,  can  be  generated  only  by  direct  assignment  or  by  an 
underflow  as  indicated  above.  During  comparisons  the  symbol  0 
simultaneously  satisfies 

9  :j>  0,  0  /  0,  9  {  0  ,  and 

x  >  9  if  and  only  if  x  >  0  too. 

Rules  like  the  foregoing  are  formidable,  and  have  not  been  implemented 
m  any  hardware  known  to  the  author  (who  would  not  expect  to  find  them  in 
any  machine  except  possibly  one  with  interval -arithmetic  built  into  the 
.ardware).  But  no  other  less  elaborate  rules  are  known  to  be  foolproof. 

'or  example,  the  CDC  6600's  hardware  follows  similar  rules  whose  most 
-hvious  difference  is  the  lack  of  any  distinction  whatever  between  under¬ 
flow  to  0  and  the  number  0.  A  comparable  deficiency  in  to  be  found  at 


those  IBM  installations  where,  to  excape  a  plethora  of  insignificant  under¬ 
flow  messages,  all  underflow  messages  are  suppressed  by  many  users 
most  of  the  time.  The  following  segment  of  FORTRAN  coding  shows  what 
can  happen  when  this  is  done.  Here  A,  B,  C,  D  and  X  are  all  positive 
normalized  floating  point  numbers  (not  special  symbols  nor  zero). 

Y  =  (A*X+B)/(C*X+D) 

Z  =  (a+b/x)/(c+d/x) 

W  =  Y/Z 

WRITE  (...)  W 


Output:  W  =  1.98 

In  the  absence  of  any  indications  of  over/underflow,  how  is  this  phenomenon 
to  be  explained?  The  only  thing  unnatural  about  this  example  is  the  WRITE 
statement;  W  is  more  likely  to  have  remained  "out  of  sight,  out  of  mind"  . 

The  replacement  of  underflowed  numbers  by  zero  with  no  indication 
to  program  nor  programmer  is  a  clearly  unsatisfactory  practice.  And 
even  when  an  indication  of  over/underflow  is  given,  there  is  ample  reason 
to  protest  against  the  destruction  by  hardware  (as  on  the  IBM  360  and 
CDC  6600)  rather  than  software  of  information  which  could  otherwise  be  of 
significance  to  the  programmer;  this  is  discussed  in  more  detail  below 
in  connection  with  the  Unnormalized  Mode  and  the  Counting  Mode  of  treat¬ 
ing  over/underflow.  But,  to  be  fair,  it  must  be  acknowledged  that  most 
programmers  would  be  satisfied  most  of  the  time  by  the  provision  of 
representations  for  +  00  ,  “  oo  ,  0  and  &  obeying  rules  like  i)  to  iv)  above. 

What  more  might  a  numerical  analyst  demand?  From  time  to  time  he 
will  want  to  generate  and  use  numbers  which  lie  beyond  the  over /underflow 
thresholds.  And  certainly  no  programmer  wants  to  be  forced  to  check  for 
over/underflow  after  (much  less  before)  the  execution  of  each  arithmetic 
instruction  in  his  program,  and  to  decide  each  time  upon  an  appropriate 
course  of  action.  He  will  prefer  to  choose  one  of  the  several  modes  of 
execution  provided  for  him  by  the  system,  with  the  understanding  that  while 
the  program  is  being  executed  in  his  chosen  mode  each  over/underflow 
will  be  treated  according  to  the  rules  tabulated  for  that  mode.  Rules  i)  to 
iv)  above  could  define  one  such  mode.  The  programmer  should  be  allowed 
to  change  modes  between  one  line  of  his  program  and  the  next.  Ideally, 
he  should  be  allowed,  if  he  wants,  to  define  his  own  mode  by  specifying 
in  detail  just  what  rules  are  to  be  obeyed  for  each  type  of  arithmetic 
operation.  Finally,  although  the  programmer  who  is  ignorant  of  the  prob¬ 
lems  of  over/underflow  must  be  warned  when  they  occur,  care  must  be 
taken  not  to  drown  him  in  a  cascade  of  over/underflow  messages,  especially 
when  they  are  obviously  irrelevant.  (An  example  of  an  obviously  irrelevant 
underflow  is  remainder  underflow  after  a  floating  point  division  in  a 
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FORTRAN  program,  which  always  discards  the  remainder.) 

An  attempt  has  been  made  to  serve  as  many  of  these  needs  as  can  be 
served  in  a  FORTRAN  context  by  means  of  a  substantial  extension  of  the 
service  supplied  by  IBM  via  their  subprogram  .FPTRP  in  IBJOB.  This 
program  exploits  the  fact  that  whenever  a  floating  point  over/underflow 
occurs  the  7094  "traps";  it  interrupts  itself  and  transfers  control  to  a 
designated  core  location  after  setting  up  an  indicator  word  (cell  0)  to 
describe  what  caused  the  trap  and  where.  This  floating  point  trap,  FPT, 
takes  precedence  over  all  others  in  the  machine;  and  when  it  occurs  the 
registers  in  the  machine  contain  the  over/underflowed  result  unaltered, 
so  that  no  significant  information  is  lost.  A  hardware  option  can  be 
purchased  (RPQ  880291)  which  includes  improper  divisions  like  l/O  in  the 
scope  of  the  FPT. 


I  rewrote  .FPTRP  in  a  way  which,  while  maintaining  compatibility, 
increased  its  speed  and  augmented  its  capabilities  so  that  programs  can 
easily  choose  and  change  to  any  one  of  five  modes  of  execution.  The 
Standard  Modes  treat  over/underflow  very  much  as  IBM  did,  the  main 
difference  being  that  now  underflow  sets  up  an  indicator  the  same  way  as 
does  overflow.  The  Unnormalized  Modes  exploit  unnormalized  arithmetic 
to  permit  underflow  to  occur  "gently"  without  setting  up  distracting 
indicators  or  messages.  The  Silent  Modes  set  indicators  to  indicate  over/ 
underflow  to  the  program  but  put  out  almost  no  messages  for  the  program¬ 
mer;  cascades  of  over/underflows  in  the  Silent  Modes  do  not  slow  programs 
down  appreciably.  The  Printing  Modes  set  indicators  for  the  program  and 
also  report  each  indicated  over/underflow,  as  it  occurs,  in  a  printed 
message  for  the  programmer,  thus  helping  him  to  debug  his  program. 

The  Counting  Mode  allows  certain  kinds  of  computations  to  be  carried  out 
with  no  risk  of  over/underflow  because  the  allowed  range  of  magnitudes 
is  extended  to  include  numbers  like 
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These  five  modes  are  discussed  below  in  appropriately  titled  subsections 
of  this  report.  The  last  two  subsections  discuss  improper  divisions  and 
simulated  over/underflows. 


THE  STANDARD  SILENT  MODE.  This  is  the  mode  in  which  the 
system  operates  by  default  in  the  absence  of  requests  for  some  other  mode. 
Whenever  a  floating  point  arithmetic  operation  overflows,  its  result  is 
replaced  by  the  largest  possible  magnitude  (1.7014  x  10®®)  with  the  same 
sign,  and  this  event  is  recorded  by  setting  OVFLOW  =  .TRUE.  .  When¬ 
ever  a  result  underflows  it  is  replaced  by  zero  with  the  same  sign,  and 
this  event  is  recorded  by  setting  UNFLOW  =  .TRUE.  .  The  indicators 
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OVFLOW  and  UNFLOW  are  logical  variables  which  can  easily  be 
sensed,  stored  and/or  reset  to  .FALSE,  in  several  ways  described 
in  the  PRM.  In  particular,  the  declarations 

LOGICAL  OVFLOW 
COMMON  / O VF LOW /O VF LOW 

permit  statements  like 

IF  (OVFLOW)  ....  and 

OVFLOW  =  .  FALSE. 

to  be  executed  without  wasting  time  on  subprogram  linkages  in  short 
loops. 

This  mode  is  called  Silent  because  each  over/underflow  sets  its 
indicator  without  disturbing  the  programmer's  output  with  any  diagnostic 
message.  However,  just  after  his  program's  execution  is  terminated 
(either  normally  or  by  kick-off)  a  message  is  produced  to  draw  the 
programmer's  attention  to  any  over/underflo-vs  that  escaped  the  atten¬ 
tion  of  his  program;  more  about  this  later.  In  the  Standard  Silent  Mode, 
each  over/underflow  costs  1  5  to  30  microseconds;  i.  e.  two  to  four  division 
times. 

THE  STANDARD  PRINTING  MODE.  This  mode  differs  from  the 
previous  mode  only  in  that  each  over/underflow,  as  it  occurs,  inserts  a 
message  into  the  programmer's  output  to  answer  the  following  questions: 

What  happened,  overflow  or  underflow? 

Which  machine  registers  are  involved;  AC,  MQ  or  both? 

What  arithmetic  operation  was  attempted;  +  ,  -  ,  *  ,  /  , 
double-prec;  sion ,  .  .  .  ,  ?  (An  octal  operation-code  is 
given  here.  ) 

What  change  was  made  in  the  affected  register(s)? 

Where  is  the  instruction  whose  execution  caused  this 
over/underflow?  (An  octal  core  address  is  given 
here. ) 

Where  in  the  source -program  did  all  this  happen? 

(An  error-trace  is  given  here  by  our  version  of 
.  FXEM.  .) 

We  also  considered  writing  out  the  operands  whose  sum,  product  or 
quotient  had  over/unde rflowed,  but  the  cost  of  doing  so  seemed  more  than 
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the  information  was  worth.  This  point  deserves  reconsideration.  Anyway, 
the  error-trace  usually  points  to  within  a  few  lines  of  the  site  of  the  over/ 
underflow  in  a  FORTRAN  program. 

The  over/underflow  handling  subprogram  .FPTRP  can  be  switched 
in  40  microseconds  from  a  Silent  Mode  to  the  corresponding  Printing 
Mode  via  the  statement 


CALL  NFPTST(M) 

with  a  positive  integer  expression  M  .  When  this  statement  is  executed, 
an  internal  counter  N  is  set  to  M  and  .FPTRP  is  caused  to  operate 
in  a  Printing  Mode  until  M  over/underflow  messages  have  been  put  out. 

N  is  decreased  by  1  each  time  a  message  is  put  out,  and  when  N  becomes 
O  an  extra  message 

NOW  OVER/UNDERFLOW  MESSAGES  ARE  IN  ABEYANCE 
is  produced  and  the  Mode  is  switched  back  to  Silent. 

CALL  NFPTST(O) 

switches  the  Mode  back  to  Silent  without  any  extra  message. 

In  accordance  with  current  good  practice,  the  FORTRAN  programmer 
is  allowed  easily  to  sense,  save,  set  and/or  reset  the  message -counter 
N  as  well  as  the  indicators  OVFLOW  and  UNFLOW.  Details  maybe 
found  in  the  PRM.  But  programmers  are  advised  not  to  set  the  latter  two 
logical  variables  to  .TRUE,  directly  in  a  FORTRAN  program;  instead 
they  are  advised  to  force  an  over/underflow  like 

DUMMY  =  (1. 7E38)**2 

This  is  done  because,  whenever  over/underflow  occurs,  .FPTRP  stores 
the  current  contents  of  SYSLOC  into  the  appropriate  indicator  to  make 
it  .TRUE.  .  Later,  when  the  program's  execution  is  finished,  the 
monitor  looks  at  each  indicator  to  see  whether  it  is  .TRUE.  ,  and  if  so 
then  that  indicator  is  interpreted  as  a  pointer  in  roughly  the  same  fashion 
as  .  FXEM.  interprets  SYSLOC  when  providing  the  first  line  of  the 
error-trace  immediately  after  an  over/underflow  in  the  Printing  Mode. 
Consequently,  the  programmer's  output  finishes,  whenever  appropriate 
and  possible,  with  a  message  like 

LAST  UNREQUITED  OVERFLOW  WAS  IN  OR  AFTER 
LINE  17  OF  DECK  SUB2  . 
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LAST  UNREQUITED  UNDERFLOW  WAS  IN  A  SUBPROGRAM 
CALLED  IN  LINE  24  OF  DECK  SUB1. 


Often  the  programmer  can  deduce  from  the  information  given  here  that 
the  over/underflows  did  no  harm;  then,  since  the  messages  have  not 
tainted  his  formatted  output,  he  is  free  to  cut  them  off  and  publish  the 
rest. 

If  program  overlay  has  intervened  between  the  last  unnoticed  over/ 
underflow  and  program  termination,  or  if  the  indicators  OVFLOW  and 
UNFLOW  were  set  to  .TRUE,  in  a  naive  way,  then  the  post-execution 
message  may  describe  the  desired  deck-name  and  line  number  as 
UNKNOWN. 

It  is  especially  important  to  understand  that  the  word  "UNREQUITED" 
means  that  the  program  did  not  respond  to  the  over/underflows  and  then 
reset  the  indicators  to  .FALSE.  .  The  programmer  may  also  have 
received  several  printed  messages  to  notify  him  of  each  over/underflow 
that  it  ignored. 

I  see  now  that  we  could  have  supplied,  at  little  extra  cost,  post¬ 
execution  warnings  more  like  this: 

3943  OVERFLOWS  WENT  UNREQUITED  BY  THE  PROGRAM 

BETWEEN  LINE  17  OF  DECK  SUB2 

AND  A  SUBPROGRAM  CALLED  IN  LINE  64  OF  DECK  SUE1. 

Such  a  message  can  be  more  useful  in  deciding  whether  or  not  to 
ignore  the  over/underflows.  Also,  the  counts  of  overflows  and  under¬ 
flows  could  be  used  by  any  programmer  who,  for  reasons  unclear  to  me, 
wished  to  terminate  his  program's  execution  after  a  specified  number  of 
overflows  had  occurred.  Another  improvement  would  be  to  allow  a 
negative  value  for  M  in 


CALL  NFPTST(M) 

to  signify  that  -M  overflow  messages  are  to  be  allowed  while  all  underflow 
messages  are  to  be  suppressed.  Most  of  these  improvements  have  been 
incorporated  into  the  adaptation  of  our  scheme  for  the  Burroughs  B550O 
written  by  Mr.  Michael  D.  Green  at  Stanford  University  in  1966,  and  I 
expect  to  put  them  into  our  system  soon. 

THE  TREATMENT  OF  UNDERFLOW.  Some  programmers  have  good 
reasons  to  want  to  be  informed  about  underflow.  They  may  want  to  avoid 
consequent  loss  of  precision  or  subsequent  division  by  zero.  But  most 
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programmers  whom  I  asked  said  they  preferred  that  underflowed  numbers 
be  replaced  by  zero  without  their  attention  being  distracted  by  the  event. 
This  attitude  was  justified  at  a  time  when  most  over/underflow  messages 
reported  "MQ  UNDERFLOW"  during  an  addition,  subtraction,  multiplica¬ 
tion  or  double-precision  division.  This  message  signified  that  the  double¬ 
length  result  of  those  operations  in  the  AC-MQ  register  was  small 
enough  to  cause  the  characteristic  of  the  less  significant  word  in  the  MQ 
to  underflow  even  though  the  more  significant  word  was  correct.  Since  the 
less  significant  word  is  entirely  ignored  in  single-precision  FORTRAN 
expressions,  and  since  the  double-precision  hardware  of  the  7094  ignores 
the  characteristic  of  the  less  significant  word  in  double-precision  expres¬ 
sions,  I  decided  that  .  FPTRP  should  simply  ignore  MQ  underflow  after 
those  operations  where  it  was  obviously  irrelevant.  *  This  decision's  first 
consequence  was  a  welcome  reduction  in  the  number  of  messages  and 
complaints,  especially  where  iterative  calculations  with  residuals  tending 
to  zero  were  concerned.  The  second  consequence  was  that  certain  old 
7090  programs,  which  had  performed  double-precision  arithmetic  by 
simulating  the  7094's  double -precision  hardware,  ran  into  spurious  over¬ 
flow  troubles  and  required  revision  so  that  they  would  use  instead  of 
simulate  our  machine's  hardware.  Fortunately,  any  user  who  insists  upon 
running  a  7090  program  unchanged  upon  our  7094  can  do  so  in  safety  by 
merely  changing  two  well-marked  instructions  in  .FPTRP  .  The  second 
instruction  is  needed  to  force  appropriate  action  when  remainders  under¬ 
flow  after  division;  otherwise  they  would  be  ignored  too. 

It  is  not  good  enough  that  the  system  ignores  obviously  irrelevant 
underflows.  Many  irrelevant  underflows  are  not  obviously  irrelevant. 
Consider,  for  example,  a  segment  of  a  typical  matrix  handling  program 
which  computes 

r  =  b  -  2  .  a.x. 

ill 

The  usual  rule,  which  replaces  each  underflowed  sum  or  product  by  zero, 
is  satisfactory  except  when  b  and  all  the  products  a^x^  are  80  close  to 

the  underflow  threshold  that  the  usual  rule  produces  a  significantly  wrong 
value  for  r.  If  all  underflows  are  reported,  how  can  the  rare  significant 
reports  be  distinguished  from  the  common  ignorable  ones?  If  no  under¬ 
flows  are  reported,  how  can  the  rare  incorrect  values  of  r  be  distin¬ 
guished  from  the  common  correct  ones?  The  easiest  way  I  know  to  cope 
with  these  questions  is  to  use  our  Unnormalized  Modes: 


’•'The  27  significant  bits  in  the  MQ  are  not  ignored  nor  cleared  when  the 
characteristic  of  the  MQ  underflows,  so  no  accuracy  is  lost. 
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THE  UNNORMALIZED  SILENT  MODE  AND  THE  UNNORMALIZED 
PRINTING  MODE.  These  two  modes  differ  from  one  another  in  just  one 
respect;  the  Printing  Mode  reports  overflows  in  the  way  described  under 
the  Standard  Printing  Mode  above.  The  two  Unnormalized  Modes  differ 
from  their  corresponding  Standard  Modes  only  in  the  way  they  treat  under¬ 
flow.  A  number,  which  in  a  Standard  Mode  would  have  underflowed  to 
zero  and  set  UNFLOW  =  .TRUE.  ,  is  in  an  Unnormalized  Mode  replaced 
by  its  closest  unnormaiized  approximation  and  UNFLOW  is  unchanged. 

For  example,  consider  a  decimal  machine  whose  underflow  threshold  is 
.  10000000  x  10-38  _  in  a  Standard  Mode,  .  15743219  x  10'^  would  under¬ 
flow  to  zero,  but  in  an  Unnormalized  Mode  it  is  replaced  by 
.  00157432  x  10"38  .  A  number  must  now  drop  below  .  00000001  x  10“®® 
before  it  is  silently  replaced  by  zero. 

In  the  Unnormalized  Modes  the  range  of  non  zero  floating  point 
numbers  representable  in  the  7094  is  extended  downward  from  2“  ^9  tQ 
2"^®®  in  single  precision  and  2-^®^  in  double  precision.  This  allows 
underflow  to  take  place  more  gently,  and  improves  the  accuracy  of  certain 
results.  But  these  benefits  are  secondary;  the  primary  justification  for 
the  Unnormalized  Modes  is  that  they  ease  the  task  of  deciding,  in  certain 
cases,  whether  a  result  is  right  or  wrong. 

For  example,  consider  the  following  FORTRAN  program  to  compute 

N 

r  =  b  -  2  a.x. 

,  i  i 


(In  accordance  with  good  computing  practice,  and  because  it  costs  almost 
nothing  extra  to  do  so  on  our  7094-11,  the  products  of  the  single -precision 
numbers  a.  and  x.  are  accumulated  to  double  precision  before  r  is 
rounded  (not  truncated)  to  single  precision.  ) 

DOUBLE  PRECISION  D 
DIMENSION  A(.  .  .  ),  X(.  .  .  ) 

D  =  -B 

C  ENTER  THE  UNNORMALIZED  MODE.  (14  MICROSEC.) 

CALL  FPTUN 

DO  1  1=1,  N 
1  D  =  A(I)*X(I)  +  D 

C  RESTORE  THE  STANDARD  MODE.  (13  MICROSEC.  ) 

CALL  FPTST 
R  =  0.  0  -  RND(D) 
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The  last  statement  rounds  D  to  single  precision,  changes  sign,  and 
adds  zero  before  storing  the  result  in  R.  If  the  rounded  value  of  D  is 
a  non  zero  unnormalized  number,  then  the  normalization  that  always 
follows  addition  will  cause  an  underflow  which,  in  the  Standard  Mode,  will 
set  R  =  0.  0  and  UNFL0W  =  .  TRUE.  .  But  if  RND(D)  is  a  normalized 
number  then  adding  zero  will  not  change  anything.  Consequently,  R  is 
correct  as  it  stands,  despite  the  possible  underflows  of  intermediate  results, 
with  the  following  exceptions: 

-  IF  ©VFLOW  OR  UNFLOW  is  .  TRUE.  ,  R  is  wrong. 

-  If  severe  cancellation  has  taken  place  in  statement  1,  R  may 
be  badly  contaminated  by  double -precision  truncation  errors. 

This  possibility  is  independent  of  over/underflow,  and  is 
irrelevant  if  B,  A,  and  X  are  each  uncertain  by  a  unit  in 
their  respective  last  places. 

-  If  R  =  g.  0  then  it  may  be  further  contaminated  by  an  error 

of  2”  ,  although  this  is  irrelevant  if  B  is  non  zero  and 

uncertain  by  a  unit  in  its  last  place.  But  if  B  =  0.  0  then  all 
the  products  A(l)*X(l)  might  have  underflowed  to  zero 
silently. 

There  are  very  few  applications  where  any  but  the  first  exception  is  rele¬ 
vant,  and  that  one  is  caught  by  the  system.  The  absence  of  over/underflow 
tests  in  the  inner  loop  permits  calculations  in  the  normal  range  to  proceed 
with  no  noticeable  loss  of  speed. 

The  Unnormalized  Modes  may  be  used  in  single  precision,  double 
precision  and  complex  arithmetic  at  the  cost  of  42  microseconds  per 
underflow.  These  modes  would  be  much  more  useful  on  a  7094  but  for  a 
quirk  in  the  hardware  which  forces  the  "normalized"  product  of  two  non 
zero  unnormalized  numbers  to  be  zero  on  certain  occasions.  The  Unnor¬ 
malized  Modes  are  best  suited  to  those  machines,  like  the  Burroughs  B  5500, 
which  handle  normalized  operands  without  serious  anomalies.  But,  because 
of  the  peculiar  behaviour  of  our  machine,  the  Unnormalized  Modes  are  so 
beset  by  restrictions  (for  which  see  the  PRM)  that  the  author  and  a  few  of 
his  students  are  perhaps  the  only  programmers  who  use  them.  We  find 
them  valuable  for  computations  with  matrices,  power  series,  and  numerical 
quadrature. 

THE  COUNTING  MODE.  This  mode  deals  with  over/underflow  in  a 
way  which  permits  programmers  to  save  all  the  significant  digits  which 
are  lost  by  the  other  modes,  and  is  specially  useful  for  evaluating  expres¬ 
sions  like  ^ 

q  =  IT  (a.  +  bi)/(ci  +  di) 
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. . 


when  q  is  likely  to  be  a  reasonable  number  even  though  its  partial 
products  and  quotients  are  afflicted  with  over/underflow.  The  execution 
of 

CALL  FPTCT(J)  , 

where  J  is  the  name  of  an  integer  variable,  switches  .FPTRP  in  14 
microseconds  to  the  Counting  Mode  and  designates  cell  J  to  act  as  a 
leftward  extension  for  the  8 -bit  characteristics  of  the  AC  and  MQ  registers. 
Henceforth,  over/underflows  are  counted  in  J  .  Whenever  an  arithmetic 
operation  overflows  its  result  is  divided  by  2^56  an(j  j  is  increased  by  1. 
Whenever  an  arithmetic  operation  underflows  its  result  is  multiplied  by 
2^56  ancj  j  decreased  by  1. 

For  example,  the  FORTRAN  statements 

CALL  FPTCT(J) 

J  =  0 

X  =  (A+B)*(C+D)*(E/F)/G 
produce  a  pair  (J,X)  whose  values  really  satisfy 

(A+B)  (C+D)  (E/F)/G  =  2256J  X  . 

In  effect,  the  missing  binary  digits  in  X's  characteristic  have  been  added 
to  J  while  X's  other  significant  binary  digits  have  remained  unchanged. 

FORTRAN  programmers  who  use  the  Counting  Mode  must  be  reasonably 
familiar  with  the  workings  of  the  compiler  so  that  they  will  not  try  to 
evaluate  expressions  like 

A/(B+C)  nor  A*B+C  nor  A**B 

in  one  FORTRAN  statement. 

The  following  example  shows  how  the  Counting  Mode  is  used  to  evaluate 

N 

q  =  TT  (a, +b.)/(c. +d.) 

^  x  i  i"  v  x  r 

for  large  N  with  no  over/underflow  tests  inside  the  DO  loops,  although 
each  over/underflow  does  cost  35  microseconds. 
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J  =  0  Initialize  Over/Underflow  Counter, 

PAB  =  1.  Numerator,  and 

PCD  =  1.  Denominator. 

CALL  FPTCT(J)  Switch  to  Counting  Mode. 

DO  1  1=1,  N  Compute  Denominator  using 

1  PCD=RND(PCD*RND(C(I)+D(I)))  Rounded  Arithmetic. 

IF(PCD  .EQ.  0.  0)  GO  TO  3  ...  because  Denominator  vanished. 

J  =  -J  Reverse  meaning  of  Counter. 

DO  2  1=1,  N 

2  PAB=RND(PAB*RND(A(l)+B(l)))  Compute  Numerator. 

Q  =  PAB/PCD 

CALL  FPTST  Switch  back  to  Standard  Mode. 

IF  (Q  .EQ.  0.0)  J=0  ...  because  Numerator  vanished. 

IF  (J)  4,  5,  3 

3  .  .  .  Q  has  Overflowed,  because  J  >  0  or  Denominator  =  0. 

4  .  .  .  Q  has  Underflowed,  because  J  <  0  . 

5  ...  Q  is  correct  as  it  stands,  because  J  =  0  . 


Whatever  value  J  may  have,  and  provided  the  denominator  PCD  is 
non  zero,  the  stored  value  Q  is  related  to  the  desired  value  q  by 


q 


2256J 


Q  . 


The  Counting  Mode  works  for  both  single  and  double  precision  arithmetic, 
and  is  indispensable  for  computing  determinants  and  certain  ratios  of 
factorials,  but  I  have  not  yet  figured  out  how  to  make  a  Complex  Counting 
Mode  work  with  comparable  elegance  on  our  machine.  However,  the  next 
example  is  one  where  our  Counting  Mode  is  useful  in  a  complex  arithmetic 
calculation. 

Suppose  the  complex  array  Z(I)  is  given  and  we  seek  K  such  that 

CABS(Z(K))  =  max  CABS(Z(I))  . 

1<I<N 


(Here  CABS(Z)  =  |  Z|  in  FORTRAN  IV.)  To  avoid  the  square  roots,  we 

may  prefer  to  calculate  only  squared  magnitudes,  thereby  exploiting  the 
equivalence  between  the  statements 

(i)  |  a  +  ib  |  >  |  u  +  iv  | 

and 

(u)  a  +  b  >  u  +  v 
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But  the  squared  magnitudes  may  over/underflow  despite  that  the  magnitudes 
|  a  +  ib  |  and  |  u  +  iv  |  are  well  within  the  machine's  range.  The  following 
program  exploits  the  equivalent  between  (ii)  above  and 

(iii)  (a-u)  (a+u)  >  (v-b)  (v+b) 

and  then  copes  with  over/underflows  via  the  Counting  Mode.  N  is  assumed 
to  exceed  1 . 

COMPLEX  Z(.  .  . ),  C,  W 
DIMENSION  ABC(2),  UVW(2) 

EQUIVALENCE  (C  ,  ABC  ,  A) ,  (B,  ABC(2)) ,  (W ,  U  VW ,  U) ,  ( V,  U  VW(2)) 

C  This  EQUIVALENCE  makes  c=a+ib  and  w=u+iv  . 

CALL  FPTCT(J) 

K=1  Initialize  current  maximum. 

C  =  Z(1) 

DO  5  1=2, N 
J=0 

W  =  Z(I) 

XL  =  (A-U)*(A+U) 

J=  -J 

XR=  (V-B)*(V+B) 

IF(XR  .EQ.  0.  .OR.  XL  .EQ.  O.  )  GO  TO  3 
IF(J)  2,  3,  1 

C  J>0  means  |  XR  |  should  exceed  |XL|  ,  so  ignore  XL  . 

1  IF(XR)  5,  5,  4 

C  J<0  means  |  XL  |  should  exceed  |  XR  |  ,  so  ignore  XR  . 

2  IF  (XL)  4,  5,  5 

C  J=0  means  XL  and  XR  are  directly  comparable. 

3  IF(XL  .GE.  XR)  GO  TO  5 

4  K=I  Update  current  maximum  whenever 

C=W  W  >C  . 

5  CONTINUE 
CALL  FPTST 

Now  C  =  Z(K)  is  the  largest  in  magnitude  of  the  values  Z(I)  .  Some 
minor  refinements  can  be  introduced  to  reduce  the  influence  of  roundoff 
in  critical  cases  of  near  equality,  but  they  do  not  change  the  relative  speed 
and  simplicity  exhibited  by  this  program  when  compared  with  alternatives. 
(For  more  details,  see  our  library  program  CMAXA  in  the  PRM. ) 

An  attempt  was  made  to  extend  the  idea  of  FPTCT  to  cope  with  integer 
overflows;  i.  e.  we  wanted  to  allow  the  FORTRAN  programmer  to  designate 
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a  cell  which  would  act  as  a  leftward  extension  of  the  integer  accumulator 
in  the  same  way  as  J  in  FPTCT(j)  acts  as  a  leftward  extension  of  the 
floating  point  accumulator's  characteristic.  However,  this  scheme  would 
first  have  required  certain  modifications  to  the  7094  to  permit  trapping  on 
fixed  point  overflow,  and  then  the  FORTRAN  IV  compiler  would  have  had 
to  be  extensively  rewritten.  A  frustrating  feature  of  the  present  compiler 
is  that  it  renders  certain  integer  overflows  undetectable.'  Consequently, 
FORTRAN  programs  which  manipulate  large  integers  are  very  much 
complicated  by  the  need  for  frequent  overflow  tests  in  advance  of  arithmetic 
operations.  The  same  complication  afflicts  ALGOL,  and  any  other  pro¬ 
gramming  language  I  know;  it  is  the  price  we  must  pay  for  a  lapse  in 
communication  among  the  architects,  implementors  and  users  of  a  pro¬ 
gramming  language. 

A  similar  lapse  has  frustrated  attempts  so  far  to  implement  the 
Unnormalized  and  Counting  Modes  upon  some  other  machines.  The  B5500 
discards  one  of  the  digits  in  the  characteristic  of  an  over/unde rflowed 
result,  thereby  preventing  any  analysis  from  determining  whether  the  result 
over/underflowed  by  a  little  or  by  a  lot.  The  IBM  360  series  wantonly 
destroys  everything,  including  the  sign  of  an  overflowed  result*.  The  CDC 
6600  has  its  own  fixed  ideas  about  over/underflow.  The  tendency  of  other 
high-performance  machines,  like  the  IBM  360/91,  to  suffer  from  impre¬ 
cise  interrupts  implies  that  those  machines  will  have  to  deal  with  over/under¬ 
flow  entirely  in  their  hardware.  This  in  turn  implies  that  their  treatment  of 
over/underflow  will  be  intolerable  unless  numerical  analysts  act  soon  to 
lay  down  reasonable  guidelines  for  machine  designers  to  follow. 

IMPROPER  DIVISIONS.  On  a  7094  with  divide -check-trap  hardware, 
improper  divisions  do  not  turn  on  the  divide -check  indicator.  Instead  they 
trap  to  .  FPTRP  which,  in  our  system,  responds  as  illustrated  below. 

38 

1. 0/0.  0  =  1. 7014  x  10  and  Overflow  occurs. 

Any  floating  point  division  (single  precision,  double  precision, 
or  complex)  of  a  non  zero  number  by  zero  is  treated  as  a 
quotient  overflow  and  sets  OVFLOW  =  .TRUE.  .  No  provision 
has  been  made  to  distinguish  such  divisions  by  zero  from  other 
quotient  overflows  (except  in  the  Counting  Mode,  where  a  message 
can  be  produced)  because  both  events  almost  always  have  the 


*This  sentence  was  true  when  it  was  written;  meanwhile  IBM  has  promised 
to  remedy  the  360's  treatment  of  over/underflow  in  a  way  that  may  well 
permit  the  schemes  described  here  to  be  copied  on  the  360's  other  than 
360/91. 

W.K.  May  1967 
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same  causes  and  consequences.  Besides,  the  programmer  can 
easily  (and  should)  test  directly  whether  a  divisor  is  zero  or  not. 

Each  division  by  zero  consumes  more  than  thrice  as  much  time 
as  any  other  overflow. 

l/O  =  Kickoff  unless  otherwise  has  been  requested. 

Fixed  point  integer  division  by  zero  is  almost  certainly  a  drastic 
error  in  a  FORTRAN  program.  In  ALGOL  the  issue  might  not 
be  so  clear. 

0.  0/0.  0  =  Kickoff  unless  otherwise  has  been  requested. 

Floating  point  di  nsion  of  zero  by  zero  is  a  symptom  of  a  serious 
flaw  in  the  analysis  behind  a  program. 

Unnormalized  Division  may  kick  off  unless  otherwise  has  been  requested. 
Floating  point  division  by  an  unnormalized  number  causes  a  trap 
(unless  the  quotient  produced  by  the  hardware  happens  to  be  correct). 
This  is  a  symptom  of  certain  programming  errors  like 

reference  to  a  variable  whose  value  has  not  previously 
been  set, 

ALOG( 3)  instead  of  ALOG(3.0), 
a  forgotten  EQUIVALENCE  (A,  I)  , 
reference  to  A(13)  when  DIMENSION  A(6)  ,  or 
a  significant  underflow  in  an  Unnormalized  Mode. 

After  the  new  .  FPTRP  was  installed,  failures  began  to  show  up  in 
programs  which  had  previously  been  allowed  to  proceed  silently  with  a 
zero  quotient  for  each  improper  division.  A  few  programmers  protested 
that  they  liked  the  old  ways  better,  but  they  seem  to  represent  a  lunatic 
fringe  among  programmers  as  a  whole.  The  author  is  under  the  impression 
that  the  new  .FPTRP's  treatment  of  improper  divisions  is  more  widely 
appreciated  than  all  his  other  works  put  together;  actually  the  credit  should 
be  shared  with  R.  Jones  and  J.  Bell,  who  found  a  way  to  simulate  the 
divide-check-trap  hardware  on  a  7094  without  that  equipment.  (The  equip¬ 
ment  is  soon  to  be  installed,  and  with  it  will  come  some  system  simplifica¬ 
tion.  ) 

However,  the  most  important  contribution  made  by  the  new  .FPTRP  is 
that  a  programmer  who  has  to  cope  with  a  complicated  numerical  problem 
can  still  write  whatever  program  first  comes  into  his  mind,  just  as  he  did 
before.  And  now  he  will  rest  assured  that,  should  his  algorithm  be 
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frustrated  by  over/underflow,  he  will  find  out  what  happened  and,  perhaps, 
be  able  to  cope  with  his  difficulty  by  simply  re -coding  a  small  part  of  his 
program  instead  of  laboriously  devising  a  deeper  mathematical  analysis 
of  his  problem.  The  new  .FPTRP  strengthens  the  programmer's  most 
valuable  tool,  hindsight. 

SIMULATED  OVER/UNDERFLOW  IN  LIBRARY  PROGRAMS.  The 
concept  of  over/underflow  is  normally  associated  with  the  elementary 
arithmetic  operations,  but  it  takes  no  imagination  to  extend  the  concept 
from  simple  functions  of  X  like 

A+X  ,  A*X  ,  A/X  ,  X**2 

to  more  complicated  functions  like 

LOG(X)  ,  EXP(X)  ,  COT(X) . 

In  general,  it  seems  reasonable  to  associate  overflow  with  the  following 
behaviour: 

as  x  -►  (x^  may  be  +  «>),  f(x)  -►+00  . 

e.g.  as  x  -*  0+  ,  log(x)  -*  -»  ; 
as  x  ■*  +00  ,  exp(x)  -*  +»  . 

And  underflow  might  just  as  reasonably  be  associated  with  this  behaviour: 


as  x  -*  +00 

,  f(x)  -*  0  . 

e.  g. 

as  x  —  -  » 

,  exp(x)  -*  0  . 

But  we  should  not  like  to  associate  underflow  with  the  value  log(l)=0. 
other  words,  underflow  occurs  only  when  the  value  of  the  function  f(x) 
is  not  zero  though  closer  to  zero  than  the  underflow  threshold. 

Here  are  some 

examples  to  illustrate  how 

our  functions  behave  in 

FORTRAN: 

LOG(0.  0) 

i  -1.7014  E38  and 

OVFLOW  is  set 

COT(+0.  0) 

=  +1.7014 

OVFLOW 

EXP(3000. ) 

*  1.7014  E38 

OVFLOW 

EXP(-3000. ) 

=  0. 0 

UNFLOW 

(+0.0)**(-3.0) 

=  +1.7014  E38 

OVFLOW 

0. 0**(-3.  0) 

7  1.7014  E38 

OVFLOW 

(-100.)**  (-25) 

=  -0.0 

UNFLOW 
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The  last  example  is  interesting  because  the  IBM  program  signals 
overflow  during  the  computation;  we  avoid  overflow  by  computing 
(l . /l00)**25  instead  of  1 .  /( 100 .  **25)  .  The  previous  two  examples 
should  not  be  confused  with 

0**(-3)  =  Kickoff  ,  code  25  ; 

the  distinction  is  consistent  with  the  rules  for  improper  divisions.  Finally, 
no  underflows  occur  when  LOG(l.  0)  =  0.  0  or  when  SINPI(X)  =  sin  it  X 
vanishes  for  integer  values  of  X. 

J  have  rewritten  several  of  the  elementary  function  subprograms  in  the 
IBLiB  library  to  ensure  that  their  over/underflow  behaviour  is  consonant 
with  the  foregoing.  When  necessary,  over/underflow  is  simulated;  this 
merely  means  that  a  transfer  to  .  FPTRP  is  forced  in  such  a  way  that  the 
FPT  indicator  word  (cell  0)  contains  just  the  informatior  needed  for  the 
desired  message  from  .FPTRP  .  The  simplest  way  to  do  this  in  a 
FORTRAN  program  is  to  square  a  very  large  or  very  small  number.  Of 
course,  .FPTRP  must  be  operating  in  one  of  its  Standard  Modes  to  allow 
such  simulated  over/under  flows  to  produce  their  intended  effects.  If  the 
Printing  Mode  is  in  use,  as  it  should  be  while  a  program  is  being  debugged, 
then  the  error-trace  points  to  the  function  which  caused  the  apparent  over/ 
underflow;  otherwise  the  post -execution  message  may  sometimes  identify 
that  function.  As  far  as  I  can  see,  no  vital  information  is  lost  by  thus 
failing  to  discriminate  between  the  simulated  over/under  flows  and  the  others. 
The  user's  view  of  the  library  programs  becomes  less  cluttered  by  their 
various  demands  for  valid  arguments.  And  the  system  gains  several 
storage  locations  vacated  by  superfluous  messages. 

However,  some  programmers  claim  that  one  desirable  capability  has 
been  lost.  For  example,  they  would  prefer  to  be  able  to  write 

CALL  KIKOPT  (9 , 0) 

in  their  main  program  whenever  they  want  references  to  LOG(X)  in  all 
their  subprograms  to  cause  kickoff  when  X  =  0.  0  .  My  scheme  requires 
that  each  appearance  of  LOG(X)  be  preceded  by  something  like 

IF  (X  .EQ.  0.0)  CALL  UNCLE  (9,  18H  LOG(X=0.  0)  ERROR)  . 

I  think  that  programs  written  the  second  way  are  easier  to  read  and  to  debug; 
but  anyone  who  wants  to  live  dangerously  can  easily  change  the  library 
programs  to  suit  himself  because  their  listings  are  usually  amply  supplied 
with  comments. 
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A  more  penetrating  criticism  of  my  scheme  is  that  it  denies  too  many 
users  the  valuable  education  obtained  by  reading  certain  IBM  diagnostics. 
For  example,  increasingly  many  of  our  users  have  too  little  familiarity 
with  the  rate  of  growth  of  exp(x)  to  appreciate  that  exp(88.0297)  exceeds 
the  overflow  threshold.  Our  university  used  to  include  a  professor  whose 
first  assignment  to  freshman  physics  students  was  to  plot  a  graph  of 
exp(x)  for  0  <  x  <  10  .  His  attitude  might  well  serve  as  an  example  for 
the  socially  acceptable  computer  systems  of  the  near  future. 

■"he  extension  of  a  comprehensive  treatment  of  over/underflow  over 
the  entire  library  of  numerical  subprograms  is  an  enormous  task  prodi¬ 
giously  demanding  of  attention  to  detail.  Here  is  a  simple  example  of  a 
typical  detail.  The  CABS  function  computes  the  absolute  value  of  a 
complex  variable  using  the  formulae 

|  a  +  ib  |  =  |  a  |  7l  +  (b/a)2  if  [  a  |  >  |  b  | 

=  |  b  [  7  1  +  (a/b)^  if  |  t>  f  _>  |  a  | 


For  simplicity  assume  the  former  case.  Then  underflow  will  occur  during 
the  computation  of  1  +  (b/a)^  whenever  (b/a)^  is  non  zero  but  smaller 
than  the  underflow  threshold.  This  underflow  is  irrelevant,  so  our  CABS 
program  suppresses  it.  Had  the  program  been  written  in  FORTRAN  the 
suppression  would  have  been  accomplished  by  computing  1  +  (b/a)^  in 
the  Unnormalized  Mode.  Similar  but  more  complicated  considerations 
affect  the  division  of  one  complex  number  by  another. 

The  task  of  taming  ever /under  flow  in  the  library  is  not  yet  completed; 
there  are  several  relatively  rarely  used  programs  that  remain  to  be  revised. 
Is  this  project  worth  its  price?  Who  should  say?  Our  users  can  no  longer 
offer  a  qualified  opinion  because  so  few  of  them  are  now  aware  of  the  issues, 
and  even  those  few  hardly  ever  have  trouble  dealing  with  over/underflow 
nowadays. 
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ROUNDING  ERROR,  ILL-CONDITIONING,  AND  INSTABILITY* 


Ben  Noble 

Mathematics  Research  Center,  U.S.  Army 
University  of  Wisconsin,  Madison,  Wisconsin 

1.  INTRODUCTION .  Modern  digital  computers  perform  so  much  arithmetic  so 
rapidly  that  we  can  print  out  only  a  minute  fraction  of  the  results  generated 
within  the  machine.  One  of  the  characteristics  of  digital  computers  is  that 
they  give  a  definite  answer  to  everything  you  ask  them  to  do,  whether  the 
answer  is  right  or  wrong.  The  challenge  is  to  write  programs  in  such  a  way 
that  confutations  are  in  some  sense  self-checking.  The  more  usual  situation 
is  that  we  try  as  far  as  possible  to  incorporate  checks,  but  the  printout 
makes  us  suspect  that  something  is  wrong  —  How  do  we  locate  the  source  of 
the  trouble? 

The  theme  of  this  paper  is  that  it  is  convenient  to  subdivide  sources  of 
difficulty  into  three  more  or  less  distinct  categories.  (We  go  into  detail 
in  connection  with  examples  later.) 

(a)  Existence  and  uniqueness.  It  is  pointless  to  look  for  a  unique 
solution  to  a  problem  if  there  is  no  solution  or  an  infinity  of  solutions. 

If  there  is  an  infinity  of  solutions  we  may  be  ab2e  to  characterize  the 
multiplicity  of  solutions  in  a  definite  way.  If  there  is  no  solution  we  may 
have  to  look  for  some  approximate  solution,  for  example  least-squares  or 
minimax. 

(b)  Ill-conditioning.  Some  problems  are  very  sensitive  to  small  changes 
in  the  initial  data.  This  is  a  characteristic  of  the  problem  itself,  and 
not  of  the  method  used  to  solve  it. 

(c)  Instability.  Some  methods  for  computing  the  answer  to  a  given 
problem  may  be  numerically  unstable  and  give  nonsensical  results,  whereas 
other  methods  for  the  same  problem  may  be  stable  and  give  accurate  results. 
Instability  is  a  characteristic  of  the  method  used  to  solve  the  problem,  not 
of  the  problem  itself. 

The  terms  "ill-conditioned"  and  "unstable"  are  not  always  used  in  exactly 
these  senses  in  the  literature  -  in  particular  they  are  often  defined  precisely 
in  connection  with  a  particular  problem  or  method.  In  our  usage,  the  important 
distinction  is  that  "ill-conditioning"  is  a  property  of  the  problem  and  "insta¬ 
bility"  is  a  property  of  the  method. 

If  a  problem  has  a  well-defined  solution  that  is  well-conditioned  (i.e., 
not  sensitive  to  small  changes  in  the  given  data)  we  say  it  is  well-posed. 
Otherwise  it  is  ill-posed.  The  property  of  being  well-posed  or  ill-posed  is 
a  characteristic  of  the  problem  itself,  not  of  the  method  used  to  solve  it. 


*  Work  performed  under  Contract  No.:  DA-31-124-ARO-D-264 
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One  of  the  reasons  why  the  distinction  between  existence  and  uniqueness, 
ill-conditioning,  and  instability  is  convenient  is  that  it  corresponds  to 
three  stages  in  the  analysis  of  any  given  problem: 

(i)  We  cannot  compute  intelligently  until  we  understand  what  to  look  for  - 

a  unique  solution,  a  family  of  solutions,  or  some  kind  of  an  approximate  solution. 
Also  a  discussion  of  singular  cases  will  tell  us  where  we  should  expect  diffi¬ 
culties.  In  general  we  are  likely  to  be  in  trouble  in  situations  where  we  have 
"nearly"  multiple  solutions  or  no  solutions  at  all.  If  the  mathematical  theory 
is  inadequate  we  may  be  forced  into  arguments  like  'the  physical  situation  has 
a  unique  solution  so  the  equations  are  likely  to  have  a  unique  solution'. 
Unfortunately  there  is  not  necessarily  a  one-one  correspondence  between  the 
physical  situation  and  the  mathematical  model. 

(ii)  Once  we  understand  the  existence-uniqueness  question  we  can  proceed 
to  an  analysis  of  the  condition  or  sensitivity  of  the  problem.  Ideally  this 
will  tell  us  when  to  expect  ill-conditioning,  and  how  to  recognize  it  in 
practice.  If  a  problem  is  ill-conditioned,  the  results  of  a  computation  are 
likely  to  be  inaccurate  due  to  rounding  errors.  Many  computers  tend  to  accept 
ill-conditioning  as  an  act  of  God.  A  more  satisfactory  attitude  is  to  regard 
it  as  man-made,  and  try  to  develop  some  ingenious  method  for  avoiding  the  ill- 
conditioning,  insofar  as  this  is  not  inherent  in  the  original  situation  that 
gave  rise  to  the  equations  we  are  trying  to  solve  -  for  instance,  it  is  some¬ 
times  possible  to  invent  a  purely  mathematical  trick  as  in  the  least-squares 
example  in  §4,  or  sometimes  the  physical  problem  can  be  reformulated  as  in 
the  chemical  experiment  mentioned  in  §5.  To  quote  J.W.  Tukey,  "If  a  job  is 
not  worth  doing,  it  is  not  worth  doing  well".  The  accurate  solution  of  an 
ill-conditioned  problem  may  fall  into  the  class  of  jobs  that  are  not  worth 
doing,  since  the  results  may  be  meaningless  if  the  initial  data  are  not 
accurately  specified. 

(iii)  Having  understood  the  problem  from  a  theoretical  point  of  view, 
we  should  be  in  a  position  to  decide  which  algorithm  to  use  to  compute  the 
solution.  One  of  the  important  properties  of  an  algorithm  is  that  it  should 
be  numerically  stable.  In  particular  it  should  not  produce  spurious  solu¬ 
tions  and  it  should  not  be  unduly  influenced  by  rounding  errors.  An  unstable 
method  will  b€j.  sensitive  to  rounding  errors  even  though  the  problem  we  are 
trying  to  solve  is  itself  well-conditioned. 

It  should  not  be  necessary  to  remind  the  reader  that,  after  all  this 
preliminary  work  has  been  done,  no  matter  how  satisfactory  the  theory,  it 
is  still  essential  to  incorporate  checks  in  programs.  When  solving  differ¬ 
ential  equations  by  step-by-step  methods,  one  can  perform  runs  for  various 
'’tep-lengths  and  check  that  these  give  consistent  answers.  When  solving 
simultaneous  equations  one  can  check  pivots,  and  so  on.  Checks  of  this 
type  ought  to  be  second  nature.  Unfortunately  many  programmers  act  like 
the  housekeeper  who  refuses  to  count  up  her  housekeeping  bills  more  than 
once  -  because  she  always  obtains  a  different  answer  the  second  time. 


2.  THE  "NOISE-LEVEL"  OF  A  CALCULATION.  One  of  the  fundamental  limitations 
Inherent  in  computing  is  that  numbers  are  specified  to  a  limited  degree  of 
accuracy.  It  will  suffice  for  our  purposes  to  consider  floating-point  computa¬ 
tions  with  numbers  to  the  base  10,  i.e.,  a  number  x  is  represented  in  the  form 

lO^q,  where  p  is  the  exponent  and  q  is  the  fractional  part.  The  number  q  is 

normalized  so  that  0.1  £  q  <1,  and  q  is  specified  to  a  given  number  of 
significant  figures.  The  result  of  a  calculation  (e.g.,  an  addition  or  a 
multiplication)  is  first  normalized  and  then  rounded  so  that  q  always  has  the 
same  number  of  digits  to  the  right  of  the  decimal  point. 

In  most  cases  it  is  impractical  to  trace  the  rounding  errors  in  detail 
through  a  calculation.  Fortunately  the  overall  effect  of  rounding  errors 
can  be  summarized  in  a  simple  way.  We  illustrate  by  means  of  a  simple  example. 
Suppose  that  we  wish  to  evaluate 


f(x)  -  x  -  1000  {(x  +  0.1)1/2  -  x1/2}. 


(1) 


(We  forestall  a  comment  by  the  expert  in  numerical  analysis,  that  this 
particular  calculation  can  be  rearranged  so  that  the  rounding  error  is  reduced. 
This  remark  is  irrelevant  here  since  we  wish  to  illustrate  what  can  happen 
when  rounding  effects  are  serious.)  On  evaluating  f(x)  to  four  significant 
figures,  we  have,  for  example,  using  the  rules  for  floating-point  described 
in  the  last  paragraph  but  not  floating  point  notation, 

f (13.40)  -  13.40  -  1000  (3.674  -  3.661)  =  13.40  -  13.00  =  0.40 

f (13.50)  =  13.50  -  1000  (3.688  -  3.674)  =  13.50  -  14.00  =  -0.50 

f (13 . 60)  =  13.60  -  1000  (3.701  -  3.688)  =  13.60  -  13.00  =  0.60 

These  results,  together  with  similarly  computed  values  of  f(x)  for  x  at 
intervals  of  0.1  from  x  =  11.0  to  16.0  are  plotted  in  Figure  1.  (The  lines 
joining  the  points  are  of  course  inserted  only  to  help  the  eye.)  A  curve 
representing  the  exact  value  of  f(x),  obtained  by  using  a  large  number  of 
significant  figures  in  the  calculation,  is  also  included.  It  is  seen  that 
the  results  obtained  by  using  four  significant  figures  fluctuate  in  a  more 
or  less  random  way  about  the  true  f(x).  The  reason  why  these  fluctuations 
are  so  large  in  this  case  is  that  there  is  a  serious  loss  of  accuracy  because 
of  the  subtraction  of  nearly  equal  numbers.  The  more  or  less  random 
fluctuations  of  the  computed  values  around  the  exact  curve,  as  illustrated 
in  Figure  1  is  analogous  to  "noise"  in  electrical  networks. 

Mathematically,  these  results  can  be  stated  in  a  convenient  form  by 
saying  that  if  f(x)  is  the  exact  value  of  a  function,  and  f*(x)  is  the  value 
obtained  by  evaluating  the  function  on  a  computer,  using  a  given  number  of 
significant  figures,  then 

|f*(x)  -  f (x) |  <  e  (2) 
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where  the  value  of  e  cannot  be  taken  smaller  than  a  certain  irreducible 
minimum,  depending  on  the  precise  way  in  which  the  calculations  have  been 
performed.  Thus  in  the  above  example  the  value  of  f*(x)  falls  within  the 
two  dotted  lines,  and  the  minimum  permissible  value  of  e  is,  by  estimation 
from  the  graph,  0.85.  This  is  an  empirical  estimate  of  e  .  The  quantity 
e  is  loosely  referred  to  as  the  noise-level  of  the  calculation,  by  analogy 
with  fluctuations  in  electrical  networks,  for  example. 

3.  POLYNOMIAL  EQUATIONS.  To  illustrate  some  of  the  general  remarks 
made  in  §1,  consider  the  problem  of  finding  the  roots  of  a  polynomial 
equation: 

n  .  n-1  ,  , 

ax  +  a.x  +  ...  +  a  ,x-t-a  =  0, 
o  1  n-1  n 

where  the  coefficients  are  real.  The  mathematical  theory  for  this  equation 

tells  us  that  the  following  possibilities  exist.  We  assume  that  n  is  an 
integer  greater  than  or  equal  to  zero. 

(*<)  n  =  0,  aQ  =  0.  The  equation  then  reads  0  =  0  and  any  z  is  a  solution. 

(6)  n  =  0,  a  ^  0.  The  equation  is  then  contradictory,  since  it  says  that 

a  =  0.  No  solution  exists, 
o 

(10  n  >  0,  a^  4  0.  The  equation  has  exactly  n  roots.  Complex  roots 
occur  in  conjugate  pairs. 

One  of  the  important  things  here  is  that  we  would  normally  regard  cases 
(•()  and  (8)  as  trivial,  but  they  will  give  trouble  on  a  computer  unless  they 
are  allowed  for  in  the  computer  program.  In  a  general  purpose  program  we 
must  allow  for  all  possibilities,  and  an  existence-uniqueness  discussion  helps 
us  to  understand  what  these  possibilities  are. 

It  is  difficult  to  deal  with  zero  and  infinity  when  using  an  automatic 
computer.  In  place  of  infinity  we  have  a  finite  upper  limit  to  the  numbers 
that  can  be  represented  within  the  machine.  In  place  of  zero  we  usually  find 
some  small  number  that  has  been  Introduced  by  rounding  errors.  We  can  tell 
the  machine  that  numbers  below  a  certain  limit  should  be  regarded  as  zero, 
but  we  have  to  be  careful  about  scaling  since,  in  floating  point,  numbers 
always  carry  the  same  number  of  significant  figures,  and  a  number  that  is 
small  compared  with  unity  can  have  a  small  relative  error.  Similarly  the 
mere  fact  that  a  number  is  large  is  no  guarantee  that  it  should  be  regarded 
as  infinite.  These  difficulties  become  acute  when  we  try  to  produce  sub¬ 
routines  that  will  cope  automatically  with  all  eventualities.  For  a 
discussion  in  connection  with  the  solution  of  quadratic  equations  (where  the 
problem  is  already  by  no  means  trivial)  see  [  1  ] . 

To  discuss  condition,  consider  the  general  equation  f(z)  =  0.  (This 
covers  transcendental  as  well  as  polynomial  equations.)  Suppose  that  there 
is  a  repeated  root  of  multiplicity  k  given  by  z  =  zq  .  Suppose  that  a 

Taylor  series  expansion  of  f(z)  exists  near  z  =  zq  ,  so  that  we  can  write 
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f  (z) 


(z-Z  )  /•  \ 

- —  f(k)(C). 


k! 


where  £  is  a  number  thaC  tends  to  z  as  z  tends  to  z  .  Consider  the  roots 

o  o 

of  f(z)+  eg(z)  ■  0,  where  e  is  a  small  parameter.  Let  Z  be  a  root  of  this 
new  equation  that  tends  to  zq  as  e  tends  to  zero.  For  small  e,  equation  (3) 
gives,  approximately, 


or 


<2  -  v 

k! 


f(k)(z  )  + 
o 


Eg(z0)  »s.0, 


kl  8<«o>  )  1/k  1/k 

f (k> (Z  ) 

0 

«/ 


(3) 


This  expression  tells  us  several  things: 

(1)  It  is  clear  that  the  multiple  roots  automatically  tend  to  be  ill- 
conditioned.  Thus  if  k  ■  2  and  e  -  10  10,  we  have  e1^2  ■  10  which 

is  very  much  larger  than  e. 

(2)  Consider  the  special  case  k  -  1,  i.e.,  zq  is  a  simple  root  of  f(z)  ■  0. 

Then 


Z»zq  -  Ce,  C  -  g(zQ)  /  f'(z0)-  (A) 

The  root  will  be  ill-conditioned  if  g(zo)/f'(zo)  is  large.  This  commonly 

occurs  when  there  is  another  root  close  to  zq.  (If  there  is  a  repeated  root, 

then  of  course  f'(zo)  ■  0  and  k  >  1.)  We  are  tempted  to  say  that  roots  that 

are  close  together  will  be  ill-conditioned.  However  the  situation  is  more 
subtle  than  this.  (The  following  examples  are  taken  from  [7]  pp.  41-47, 
where  further  details  can  be  found.)  Consider  the  polynomial  of  degree  20 
with  roots  z  «  1,2,  ...  ,20,  i.e.  the  expanded  form  of 

(z-l) (z-2)  ...  (z-20). 

If  we  work  out  the  coefficient  of  t  in  (4)  for  the  root  z  *16  and 

19  0 

g(z)  -  z  we  find 
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C|«s  0.24  •  10  . 


' 


Hence  the  root  zq  =  16  Is  very  ill-conditioned  even  though  we  might  think 

that  the  roots  of  the  polynomial  equation  are  reasonably  well  separated. 
(The  smal]  roots  of  this  polynomial  equation  are  well-conditioned.)  The 


roots  of  z  =1,  namely  the  twentieth  roots  of  unity,  are  equally  spaced  on 
a  circle  of  unit  radius.  We  might  think  that  these  are  very  close  together, 
and  therefore  likely  to  be  ill-conditioned.  The  reverse  is  the  case.  Using 
(4)  we  find 


C|  =  1/20 


for  all  the  roots,  so  that  all  the  roots  are  well-conditioned. 

(3)  Consider  the  case  of  a  double  root,  k  *  2.  Equation  (3)  gives 


Z  =  z0  + 


2g(z0> 

f:'(zo) 


1/2 


1/2 


(5) 


If  the  quantity  in  the  parenthesis  is  negative,  Z  may  be  complex  even  though 
Zq  is  real.  As  an  example,  suppose  that  we  try  to  solve 

1.4z2  -  2.8z  +  1.4  =  0, 

working  to  two  significant  figures  in  the  usual  formula: 

2  2  1/2 

z  =  {2.8  +  [2.8  -  (1.4)  ]  }/  2.8 

=  {2.8  +  [7.8  -  4(2.0)]1/2}/  2.8 
=  1.0  +  0.16i. 


The  correct  answer  is  that  there  is  a  double  root  z  =  1.  (In  passing  we 
note  that  if  we  know  there  is  a  double  root,  equation  (5)  suggests  that 
if  a  numerical  procedure  produces  two  roots  that  are  close  together  and  the 
method  is  such  that  the  errors  are  correlated  -  which  is  often  the  case  -  a 
much  better  estimate  of  the  root  can  be  obtained  by  taking  the  mean  of  the 
two  results.  In  the  above  example  this  gives  the  exact  repeated  root  z  *  1!) 

An  explicit  formula  is  not  usually  available  for  the  roots  of  an 
equation  (f(z)  -  0,  and  most  methods  of  solution  will  depend  on  the  evaluation 
of  f(z)  for  various  values  of  z.  This  is  true  of  all  iterative  methods,  for 
instance  -  the  bisection  method,  the  secant  method,  straightforward  iteration 

=  f(zr)>  Newton's  method,  and  so  on.  The  idea  of  noise-level  is  useful 

here.  For  a  simple  root,  the  situation  is  illustrated  graphically  in  Figure 


2(a).  Whatever  method  is  used,  if  the  accuracy  depends  on  the  accuracy  of 
the  evaluation  of  f(z),  the  best  we  will  be  able  to  do  is  to  say  that  the 
root  lies  somewhere  in  the  range  PQ,  Independent  of  the  method  used  to 
find  the  root.  From  this  point  of  view,  the  reason' "why  the  situation  for  a 
double  root  is  more  serious  is  illustrated  in  Figure  2(b).  Although  the 
noise-level  is  the  same  as  in  Figure  2(a),  the  range  of  uncertainty  PQ 
is  much  greater.  If  we  are  unlucky 


(CL)  (b) 

Figure  2.  Noise-level  and  the  accuracy  of  the  determination  of  roots. 


and  rounding  errors  cause  the  machine  to  produce  values  of  f(z)  that  lie  above 
the  exact  curve  in  Figure  2(b),  the  machine  may  report  that  there  is  no  root 
in  this  region  of  z. 

We  now  come  to  the  question  of  the  stability  of  the  algorithm  used. 
Consider  solution  of  the  quadratic  equation 

2 

az  +  bz  +  c  ■  0, 
by  means  of  the  usual  formula 

z  -  <  -b  +  (b2  -  4ae)1/2l  /  (2a).  (6) 

Consider 

z2  -  lOOz  +1*0. 
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Working  to  three  significant  figures  in  floating  point,  formula  (6)  gives 


z  =  {100  +  (1002  -  4)1/2}  /2  =  100  or  0. 


The  smaller  root  has  been  lost  altogether  because  of  cancellation  of  equal 
numbers.  However  if  we  solve  the  equation  by  means  of  the  formulae 

z^  =-sgnb •  { | b  |  +  (b2  -  4ac)^2}  /  (2a), 
z2  =  c/(az1), 

) 

we  obtain  z^  =  100,  z^  -  0.0100.  The  relative  accuracy  of  z^  is  now  good. 


In  our  terminology,  (6)  is  an  unstable  formula  for  the  numerical  solution  of 
a  quadratic,  whereas  (7)  is  stable .  (This  type  of  example  has  been  overworked, 
but  this  does  not  affect  its  value.) 


As  a  second  example  of  the  distinction  between  stable  and  unstable, 
consider  the  straightforward  iteration 

zr+l  '  F(zr>' 

It  is  well  known  that  any  given  equation  can  be  arranged  in  this  form  in  many 
different  ways,  some  of  which  give  iterations  that  may  converge  and  some 
diverge.  In  our  terminology  we  say  that  the  convergent  arrangements  are 
stable,  the  divergent  arrangements  are  unstable. 

To  conclude  this  section,  consider  solution  of  the  quadratic  equation 


x  -  2x  -  1 . 6 


by  Newton's  method  for  real  roots.  The  iterative  formula  is 

x  2  -  1.6 
r _ 

Xr+1  =  2(x  -  1)  * 

r 

which  gives  the  following  sequence  of  values,  if  we  start  with  xq 
r  1  2  3  4  5  6 

x  0.45  1.27  0.00  0.80  2.40  1.49 


r 


1.4: 


Other  starting  values  give  similar  results.  It  is  easy  to  see  why  the 
iterates  oscillate.  The  quadratic  has  complex  roots,  and  graphically  (Figure 
3)  the  given  by  Newton's  method  is  the  intersection  with  the  x-axis  of 

the  tangent  to  the  curve  at  the  point  on  the  curve  with  abscissa  Although 

a  cursory  examination  of  the  numerical  results  might  cause  us  to  think  that  the 
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iteration  is  unstable,  our  trouble  is  in  fact  due  to  uniqueness-existence  — 
we  are  trying  to  find  a  real  root  that  does  not  exist. 


Figure  3.  An  oscillatory  case  of  the  Newton-Raphson  procedure. 
The  order  of  points  is  FQR...Z. 


A.  LEAST-SQUARES  SOLUTION  OF  LINEAR  EQUATIONS.  We  begin  by  briefly 
summarizing  the  existence-uniqueness  theory  for  a  set  of  simultaneous  linear 
equations  Ax  =  b,  where  A  is  m  x  n.  There  are  three  possibilities.  The 
equations  may  have 

(i)  No  solution. 

(ii)  A  unique  solution. 

(iii)  An  infinity  of  solutions. 

If  an  infinity  of  solutions  exist,  the  general  solution  can  be  written  in 
the  form 


X 


(8) 


n-r 

=  x  +  £  JL .  y  , 

O  u  '1  J  1 

i=l 


where  r  is  the  rank  of  A,  and  the  are  solutions  of  the  homogeneous 


equations  Ay  =  0.  If  no  solution  exists,  we  often  find  a  solution  that 
minimizes  the  sum  of  squares  of  residuals  r  =  b  -  Ax.  This  least-squares 
solution  can  be  obtained  by  solving 


ATAx  =  Arb.  (9) 

There  are  only  two  possibilities  for  the  solution  of  these  equations  — 
there  may  be  a  unique  solution  or  an  infinity  of  solutions. 

The  subject  of  ill-conditioning  and  linear  equations  is  a  long  story 

and  this  is  not  the  place  to  go  into  detail.  We  content  ourselves  with  the 

statement  that,  if  A  is  square  and  properly  scaled,  then  a  small  value  for 

the  determinant  of  A  indicates  that  the  equations  A>  =  b  are  ill-conditioned. 

(The  discerning  reader  will  realize  that  we  are  trying  to  disguise  the  present 

unsatisfactory  state  of  the  art  by  not  defining  what  we  mean  by  proper  scaling. 

It  is  not  sufficient  to  arrange  that  the  largest  element  in  each  row  and 

column  of  A  be  of  order  unity  in  magnitude.)  The  result  that  we  wish  to  make 

plausible,  which  is  well  attested  by  experience,  is  that  if  Ax  =  b  is  ill- 

conditioned,  then  the  condition  of  the  equations  ATAx  =  A^b  is  much  worse. 

This  follows  when  A  is  square,  if  we  accept  our  previous  criterion  for  ill- 

T  2  —6 

conditioning  since  detA  A  =  (detA)  .  If  detA  is  small,  say  10  ,  then 

2 

(detA)  is  much  smaller  still. 

The  main  point  we  wish  to  illustrate  in  this  section  is  that,  instead 
of  simply  accepting  the  fact  that  the  condition  of  (9)  may  be  much  worse 
than  the  condition  of  (8),  we  can  do  something  about  it.  In  the  equations 
Ax  =  b,  where  A  is  m  x  n  (m  >  n)  of  rank  n,  partition  A  and  b  in  the  form 


A  = 


A 


2 


(10) 


where  A^  is  a  nonsingular  matrix  of  order  n,  the  choice  of  which  will  be 

discussed  later,  and  b^  is  nXl.  Since  A^  is  nonsingular,  the  last  m  -  n 

rows  of  A  can  be  expressed  as  linear  combinations  of  its  first  n  rows.  This 
means  that  we  can  find  a  matrix  P  such  that 


A 


2 


(11) 
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i.e. 


On  inserting  this  expression  for  A,  together  with  b  from  (10) ,  in  the 
least  squares  equations 

T  T 

A  Ax  *  A  b, 


we  find 


A  T  [I,  PT]  A  x  -  A  T  [I,PT]  1 

P  b2 


[I  +  PTP]  Axx  -  AxT  [bx  +  PTb2]  . 


Since  A^  is  nonsingular,  we  can  multiply  through  by  (A^  )  : 


[I  +  PTP]  Axx  -  b1  +  PTb2. 


This  is  the  required  form  of  the  least-squares  equations.  We  claim  that  if 
the  set  of  equations 

T  T 
A  Ax  -  A  b 


is  very  badly  conditioned,  the  condition  of  the  set  (15)  will  be  much  better, 
provided  A^  is  chosen  properly.  Before  discussing  how  to  choose  A^  we  remark 

that  (15)  can  be  rearranged  in  the  form 

T  -1  T  '' 

AjX  -  b1  +  [I+PAP]  PA  [b2  -  Pbx]  . ,  (16) 


If  the  last  m-n  equations  in  Ax  *  b  are  simply  linear  combinations  of  the  first 
n  equations  this  means  that  if  P  is  defined  as  in  (11)  then  we  must  also  have 
b2  *  Pb^.  This  means  that  the  second  term  on  the  right  of  (16)  vanishes,  and 

we  find  the  least-squares  solution  by  simply  solving  A^x  ■  b^,  as  we  should 

expect.  If  the  equations  Ax  *  b  arise  in  a  physical  situation  then  we  should 
expect  that  the  last  m-n  equations  would  be  nearly  equal  to  linear  combinations 
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of  the  first  n,  i.e.,  would  be  nearly  equal  to  Pb^  and  the  last  term  in 
(16)  will  be  a  small  correction  to  b^. 

T  T 

The  step  that  improved  the  condition  of  A  Ax  =  A  b  was  the  multiplica- 
T  -1  T 

tion  of  (14)  by  (A^)  to  give  (15).  Since  I  +  P  P  is  positive  definite, 

with  determinant  greater  than  unity,  the  condition  of  (15)  is  determined 
essentially  by  the  condition  of  A^.  How  do  we  choose  A^? 

The  result  of  150  years  work  on  the  numerical  solution  of  simultaneous 
linear  equations  is  that  Gaussian  elimination  is  still  the  best  general 
purpose  method  if  precautions  are  taken  to  choose  the  pivots  correctly  In 
the  terminology  of  51,  Gaussian  elimination  is  an  unstable  computing  procedure 
when  rounding  errors  are  present  unless  the  pivots  are  chosen  in  a  suitable 
way.  The  usual  rule  is  to  use  either  partial  or  complete  pivoting.  We 
illustrate  by  an  example.  Suppose  that  we  are  working  in  floating  point  to 
two  significant  figures.  Consider  the  equations 


xx  -  x2  =  0 


10  +  x2  =  1, 


(17) 


which  have  the  exact  solution  x^  *  x2  «  100/101.  To  solve  these  numerically 

we  can  use  the  first  equation  to  eliminate  x^  from  the  second.  In  more 

technical  language,  we  use  the  coefficient  of  x.  in  the  first  equation  as  pivot 

-2  1 

If  we  multiply  the  first  equation  by  10  and  subtract  from  the  second,  we 

obtain  -1.01x2  =  -1.  However  we  are  working  to  two  significant  figures,  so 

1.01  is  rounded  to  1.0,  and  this  equation  gives  x2  =  1,  where  a  cap  is  used 

to  denote  ''computed  value."  Back-substitution  in  the  first  equation  gives 
x^  =  1,  and  we  have  obtained  a  reasonable  approximate  solution  of  the  equations 

Suppose  however  that  we  pivot  on  the  coefficient  of  x^  in  the  second  equation 
in  (17).  We  multiply  the  second  equation  by  10^  and  subtract  from  the  first. 

As  before  on  rounding  this  gives  the  computed  result  £„  *  1,  but  back-substitu¬ 
tion,  now  in  the  second  equation,  gives  x^  =  0.  In  this  case  the  computed 

solution  is  no  longer  a  reasonable  approximation  to  the  exact  solution.  The 
only  difference  has  been  in  the  choice  of  pivots,  and  this  illustrates  that 
the  choice  of  pivots  is  important.  The  reader  may  have  gained  the  impression 
that  the  reason  why  the  first  solution  was  satisfactory,  whereas  the  second 
was  not,  is  connected  with  the  fact  that  the  pi.vot  used  in  the  first  case  (1) 
is  greater  than  the  pivot  used  in  the  second  (10-2).  This  is  the  assumption 
behind  complete  and  partial  pivoting.  Partial  pivoting  tells  us  to  pick  the 
largest  coefficient  of  the  variable  we  propose  to  eliminate,  for  instance. 

It  is  easy  to  rescale  the  set  of  equations  (17)  so  that  partial  or  complete 
pivoting  is  unsatisfactory.  The  nub  of  the  matter  is  that  we  are  working  in 
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floating  point  so  that  it  is  only  relative  error  that  is  important,  whereas 
pivotal  strategies  usually  depend  on  criteria  involving  absolute  magnitudes. 
Suppose  that  we  rescale  (17)  by  multiplying  the  first  equation  by  10~2,  the 
2  2-2 

second  by  10  ,  and  set  x^  =  10  z^,  x2  “  z2*  ^ie  e<luations  become 


z i  -10  ^z2  -  0 


102k2  +  z2  -  102. 


(18) 


2 

If  we  pivot  on  the  "large"  coefficient  10  we  find,  working  to  two  significant 

2 

figures  in  floating  point,  z2  ■  10  ,  z^  ■  0,  which  gives  x2  =  1,  x^  *  0,  i.e., 

the  same  unsatisfactory  solution  found  by  pivoting  on  the  corresponding 
coefficient  in  (17).  (It  is  easy  to  see  that,  for  a  given  choice  of  pivots, 
rescaling  by  powers  of  10  will  not  affect  the  relative  rounding  errors.) 

-4 

However  if  we  pivot  on  the  'small"  coefficient  10  in  (18)  we  obtain  the 

-2  2 

satisfactory  approximate  solution  2^  ■  10  ,  22  *  10  ,  2^  *  #2  *  1. 


The  moral  of  this  discussion  is  that  success  of  partial  or  complete 
pivoting  depends  on  proper  scaling.  Various  arguments  indicate  that  it  is 
reasonable  to  scale  so  as  to  minimize  the  condition  number  | | A | |  | |A-1||  . 


This  can  be  done  for  the  infinity-norm,  for  instance,  by  arranging  that 
the  absolute  row  sums  of  A  are  the  same,  and  those  of  A-1  are  the  same 
(F.L.  Bauer).  For  further  discussion  and  references  to  the  work  of  Wilkinson, 
Bauer,  and  ethers,  see  [2]. 


The  question  of  pivotal  strategy  is  relevant  to  the  choice  of  A^  in  the 

decomposition  (10)  used  in  the  method  suggested  above  for  avoiding  ill- 
conditioned  least-squares  equations.  If  we  use  either  partial  or  complete 
pivoting  to  reduce  A  to  row-echelon  form  this  will  single  out  n  rows  of  A. 

We  choose  A^  to  consist  of  these  n  rows.  The  value  of  detA  is  the  product  of 
the  pivots.  By  using  partial  or  complete  pivoting  we  are  trying  to  choose  an 
A^  whose  determinant  is  as  large  as  possible.  This  should  be  the  submatrix 
of  order  n  from  A  that  is  as  well-conditioned  as  possible. 

5.  ORDINARY  DIFFERENTIAL  EQUATIONS.  A  great  deal  is  known  about  the 
existence  and  uniqueness  of  solutions  of  ordinary  differential  equations. 
Rather  than  go  into  detail  we  simply  quote  [3],  pp.  15,  112,  347,  for  typical 
theorems  that  are  likely  to  be  useful  when  computing.  We  also  remind  the 
reader  of  some  simple  examples  where  the  conditions  for  existence  or  unique¬ 
ness  of  solutions  of  y'  ■  f(x,y)  are  not  satisfied.  The  equation 

y'  *  1  +  y2 

has  the  solution  y  *  tan  (x+c),  where  c  is  an  arbitrary  constant,  and  this 
solution  does  not  exist  when  x  ■  (n  +  1/2 ) •  tt  -  c. 


222 


y'  =  |y|1+e,  y(0)  =  1,  e>o, 


-l/e 

has  the  solution  y(x)  =  (1  -  ex)  which  ceases  to  exist  at  x  =  l/e. 

If  i_ 

y'  =  t y I  e»  y(0)  =  o,  e>  o, 

we  have  an  infinity  of  solutions: 

y(x)  =0,  o  ^  x  <  c,  y(x)  =  [ e (x-c) ]  1^e,  e>  0, 

for  arbitrary  c  >  0. 

Existence  and  uniqueness  questions  arise  when  resonance  occurs  in  a 
physical  system.  A  simple  example  occurs  in  connection  with 


+  *  y  =  f(x),  o  ^  X  ^  ir,  y (o)  =  y (it)  =  0. 

If  A=  n7r  ,  for  integral  n,  the  homogeneous  equation  has  the  solution  y  *  sin  nx. 
The  situation  then  is:  Let 


k 


ir 

f(x)  sin  nx  dx. 


o 


There  are  two  possibilities: 

(i)  If  k  =  0  the  equation  has  an  infinity  of  solutions. 

(ii)  If  k  5s  0  the  equation  has  no  solutions. 

We  next  make  some  remarks  about  ill-conditioning.  A  typical  situation 
is  that  small  changes  in  the  initial  conditions,  in  an  initial-value  problem, 
produce  large  changes  in  the  answer.  We  consider  an  example  where  this  is 
caused  by  the  presence  of  exponential  solutions.  Consider 

y*  =  °<y  +  (6  -  «<)  e^X,  y  =  yQ  at  x  =  0.  (19) 

The  general  solution  is 

y  =  (y  -  l)e^x  +  e6x.  (20) 

If  y  =  ]  then  y  =  e^x,  and  if  y  =  1  +  e  we  have  y  =  ze**  +  e^X.  If  °(  >  6 
we  will  have  e  >>  eD  for  large  enough  x,  and  the  first  term  on  the  right 
of  (20)  will  dominate  the  second  for  large  x,  no  matter  how  small  e  is.  As 
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an  example,  if  *K  *  10,  6  *  -1,  we  find 


y  =  e 
'  o 

—v  1  Ov 

yQ  -  1.0001,  y  =  e  +  O.OOOle  , 


y(l)  s*  0.37 
y  (1)  2.57. 


A  change  of  1  in  10  in  the  initial  condition  produces  a  change  of  7  to  1  in 
the  solution  at  x  ■  1,  and  the  difference  is  even  more  catastraphic  for  larger 
x.  The  problem  is  obviously  ill-conditioned.  We  have  already  said  that  if 
a  problem  is  ill-conditioned  we  should  try  to  reformulate  it  in  some  way  so 
that  the  ill-conditioning  is  removed.  It  is  possible  to  do  this  in  the 
present  case  if  we  know  that  the  solution  tends  to  zero  as  x  tends  to  infinity. 
We  can  use  this  to  replace  the  initial  condition.  Thus 


y'  -  lOy  +  e  x,  y(o)  «  1,  (21) 

is  an  ill-conditioned  problem,  but 

y'  ■  lOy  +  e  X,  y  ■>  0  as  x+  ®  ,  (22) 

is  well-conditioned .  This  last  problem  can  be  solved  satisfactorily  by 
integrating  back  from  large  x  towards  the  origin.  The  two  formulations 
(21)  and  (22)  are  mathematically  equivalent. 


We  next  turn  our  attention  to  computational  difficulties,  not  present 
in  the  original  differential  equation,  but  introduced  by  the  difference 
scheme  used  to  solve  the  equation  numerically.  In  this  connection  the  word 
"instability"  is  used  in  a  technical  sense,  for  details  of  which  we  refer 
the  reader  to  the  literature  [3],  [A].  Roughly  speaking,  a  difference  scheme 
is  said  to  be  unstable  if  it  introduces  spurious  solutions  that  are  not 
present  in  the  original  problem,  and  these  dominate  the  solution  we  want  to 
find,  if  we  are  integrating  over  a  fixed  range  of  x,  and  we  let  the  step-size 
in  the  difference  scheme  tend  to  zero.  This  type  of  difficulty  is  now  well 
understood  and  will  not  be  considered  further  here. 

A  common  source  of  trouble  is  illustrated  by  the  example 

y"  +  101  y'+  100  y  ■  0,  y  =  0,  y'  ■  99,  at  x  *  0.  (23) 


The  exact  solution  is 

-x  -lOOx 
y  ■  e  -  e 

Two  common  difficulties  when  trying  to  compute  this  solution  are: 

(i)  The  programmer  realizes  that  the  solution  e  negligible 

whenever  x  is  greater  than  about  0.05,  leaving  only  e  X.:  He  therefore 


adjusts  the  step-length  h  to  be  suitable  for  this  part  of  the  solution  taking, 
perhaps,  h  =  0.02.  However  the  step-length  that  must  be  used  in  the  standard- 
type  difference  formula  is  still  controlled  by  the  term,  even  though 

this  is  negligible  in  the  actual  solution. 

(ii)  The  programmer  realizes  that  the  e  term  controls  the  step-length, 

so  he  takes  h  to  be,  say,  0.0001.  Then  he  complains  that  the  computation  takes 
an  interminable  length  of  time  on  a  digital  computer. 

This  is  a  typical  boundary-layer  problem.  The  highest  order  derivative 
is  important  only  over  a  small  part  of  the  range.  In  this  example,  one  answer 
would  be  to  take  short  steps  from  x  =  0  to  x  =  x^,  say,  where  x^  is  chosen 

so  that  the  contribution  from  the  e  term  is  negligible.  Suppose  that  we 

find  that  y  =  y^  at  x  =  x^.  If  we  did  not  know  the  exact  situation,  we  might 

then  simply  drop  the  y"  term  in  (23),  having  check  computationally  that  it 
is  small  compared  with  the  other  two  terms,  and  solve: 

101  y'  +  100  y  =  0,  y  =  y^  at  x  =  x^  , 

which  would  give  a  reasonable  approximation  to  the  correct  solution  quite 
quickly.  The  calculation  can  be  checked  by  varying  the  point  x^  at  which 
the  changeover  occurs. 

Boundary  layer  phenomena  often  occur  in  connection  with  boundary  value 
problems.  A  typical  example  is: 

ey"  +  y  =  0,  e>  0,  y(o)  =  0,  y(l)  *  1. 

For  small  e  the  solution  is  almost  zero  except  near  x  =  1.  A  more  unusual 
example  is: 

e(y")2  +  xy'  -  y  =  0,  e>  0,  y(l)  =  y(-l)  -  1.  (24) 

Neglecting  the  second  derivative  we  have  xy’  -  y  =  0,  with  solution  y  *  Cx, 
where  C  is  an  arbitrary  constant.  The  solution  must  be  symmetrical  about  the 
y-axis,  and  the  first  possibility  that  suggests  itself  is  that  y  =  0  over 
most  of  the  range,  with  boundary  layers  at  the  end-points.  However  this 
would  mean  that  near  x  =  1  the  value  of  y'  would  be  large  and  positive,  which 
is  not  possible  since  (24)  would  then  imply  that  r(y")2  is  negative  which  is 
impossible.  It  turns  out  that  the  solution  is  approximately  y  =  x  in  most  of 
0  <  x  (  1,  y  ■  -x  in  most  of  -1  ^  x  <  0,  and  these  solutions  are  joined  by  a 
"corner  layer"  near  x  =  0.  I  am  indepted  to  Carl  Pearson  for  this  example. 

He  has  also  made  the  sensible  remark  that  in  many  of  these  problems  an  effective 
computational  procedure  if  c-  10“!^  gay,  is  to  compute  a  series  of  solutions 
with  e  =  10_2#  10”^,...,  in  turn.  The  computations  with  the  larger  e  will 
be  less  difficult,  and  they  will  provide  successive  guides  that  tell  us  where 
boundary  layers  are  developing,  and  how  sharp  they  are. 

To  conclude  this  section  we  draw  the  reader's  attention  to  a  quite 
different  type  of  example  considered  in  detail  in  [6],  and  in  a  simplified 
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form  in  [5],  Chap.  12.  A  chemical  reaction  involving 
centrations  a^,  i  *  1,2,3,  is  governed  by  a  system  of 

equations  in  three  unknowns: 


three  species  with  con- 
three  linear  differential 


da^/dt 


-(k21+k31)al  +  k12a2  +  k13a3 


(25) 


with  two  similar  equations  for  da^/dt,  da^/dt.  The  solution  of  these  equations 
is  known  to  be  of  the  form 


-yt  -vt 

11  +  c12e  +  c13e  ' 


(26) 


with  similar  expressions  for  a^  and  a^.  The  concentrations  a^  can  be  measured 

experimentally  for  various  values  of  the  time.  It  is  required  to  find  the  rate 
constants  k^  that  appear  in  (25).  The  most  obvious  procedure  is  to  use  curve 

fitting  with  exponentials  to  deduce  from  (26)  the  values  of  y,  v,  and  the  c. 


Then  deduce  the  k^  from  the  fact  that  (26)  is  the  solution  of  (25). 

Unfortunately  fitting  of  exponentials  is  an  ill-conditioned  procedure.  It 
turns  out  that  if  we  perform  a  detailed  analysis  of  the  relation  between  (25) 
and  (26) ,  a  procedure  can  be  devised  that  will  enable  the  experimentalist  to 
design  his  experiment  in  such  a  way  that  he  can  find  initial  concentrations 
such  that  either  c^  or  c-jj  zero  in  (26).  It  is  then  possible  to  deduce 


the  rate  constants  by  a  well-conditioned  procedure.  Details  can  be  found  in 
the  references.  From  our  point  of  view  the  moral  is  again  that  if  one  method 
for  performing  a  calculation  is  ill-conditioned  we  should  look  for  an  equivalent 
well-conditioned  procedure. 


6.  CONCLUDING  REMARKS.  We  have  illustrated  the  existence-uniqueness,  ill- 
conditioning,  and  instability  classification  of  difficulties  by  discussing 
various  aspects  of  three  types  of  problem-polynomial  equations,  least-squares 
solution  of  linear  equations,  and  ordinary  differential  equations.  We  could 
equally  well  have  illustrated  the  classification  by  discussing  other  standard 
problems  in  numerical  analysis  -  eigenvalues  -  eigenvectors,  approximation 
theory,  partial  differential  equations,  integration,  integral  equations. 

In  the  lecture  from  which  this  paper  originated,  the  three-way  classifica¬ 
tion  was  also  characterized  as  follows: 

(1)  Ignorance  —  If  we  try  to  find  a  real  root  of  a  polynomial  when  all 
the  roots  are  complex,  this  is  simply  ignorance  of  the  existence-uniqueness 
situation. 

(2)  Cussedness  —  Ill-conditioned  problems  are  inherently  troublesome 
-  the  difficulty  stems  from  the  nature  of  the  problem,  and  often  there  is 

little  we  can  do  about  the  problem  as  it  stands.  The  best  remedy  is  to 
circumvent  our  difficulties. 
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(3)  Stupidity  —  Instability  troubles  are  usually  due  to  the  fact  that 
we  are  not  clever  enough  to  choose  the  correct  computational  method.  Perhaps 
this  is  rather  a  harsh  term  to  apply  to  situations  where  foolproof  computational 
methods  are  not  yet  known  -  such  as  the  choice  of  pivots  in  Gaussian  elimination. 

Briefly,  if  a  problem  gives  trouble,  we  must  first  decide  whether  we  are 
simply  ignorant  of  the  existence-uniqueness  theory.  If  we  are  sure  that  we 
are  looking  for  the  correct  type  of  solution,  we  must  decide  whether  the 
problem  itself  is  cussed  (in  which  case  it  is  probably  best  to  try  to  re¬ 
formulate  it)  or  whether  we  have  simply  been  stupid  in  our  choice  of  method. 

My  own  experience  is  that  this  procedure  has  been  useful  when  trying  to 
track  down  sources  of  trouble  —  But  when  all  is  said  and  done,  and  the 
source  of  difficulty  has  been  located  the  most  appropriate  comment  is  often 
1  Corinthians,  Chap.  I,  v.27  -  "God  hath  chosen  the  foolish  things  of  the 
world  to  confound  the  wise." 
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monitor  to  forget  the  last  reference  to  KICKED  whenever  overlay  occurs. 
We  take  no  pride  in  this  expedient. 

Any  programmer  who  is  aware  of  these  two  limitations  can  easily  code 
around  them.  Simple  suggestions  are  contained  in  the  PRM.  Indeed,  the 
limitations  are  so  easy  to  circumvent  that  programmers  sometimes  forget 
to  do  so,  and  for  this  reason  we  have  included  a  warning  message  like  the 
one  in  the  following  example; 

0.  0/0.  0  ERROR  AT  14506 

EXECUTION  TERMINATED. 

ERROR-TRACE  WITH  CALLS  IN  REVERSE  ORDER  CODE  25 


CALL  IS  IN 
DECK  NAMED 

SUB2 

SUB1 

MAIN 


AT  IFN  OR 
LINE  NO. 


ABSOLUTE 

LOCATION 

14513 

07762 

05413 


EXECUTING  IFN/LINE  NO.  2  OF  'SUB1'  AFTER  PROGRAM 
WAS  KICKED  OFF.  FROM  NOW  ON  IN  'SUB1',  THE  VALUE 
OF  A  SUBSCRIPTED  VARIABLE  WITH  VARIABLE  SUBSCRIPT, 
OR  THE  EXECUTION  OF  A  COMPUTED  'GO  TO'  OR  'DO' 
STATEMENT  WITH  VARIABLE  PARAMETER,  MAY  BE 
INCORRECT  UNLESS  THE  RELEVANT  INDEX  IS  RESET. 

SEE  THE  PROGRAMMERS'  REFERENCE  MANUAL. 

This  message  is  more  formidable  than  necessary.  It  would  be 
unnecessary  altogether  if  the  IF(KICKED(OFF))  statement  were  imple¬ 
mented  in  a  language,  like  ALGOL,  with  a  block  structure.  Then  kick-off 
within  a  block  would  cause  control  to  be  transferred  to  the  last  KICKED 
reference,  if  any,  executed  in  the  same  block  but  not  in  a  deeper  sub-block. 

One  other  complication  would  arise  were  the  IF(KICKED(OFF))  state¬ 
ment  to  be  implemented  within  a  compiler  which  contained  a  MONITOR 
statement.  Such  a  statement  is  exemplified  by 


MONITOR  X,  Y(*),  Z(*.  3),  PROG,  n 

which  would  cause  output  of  the  following  kind  to  be  generated; 

Whenever  the  variable  X  is  changed,  write  out  its  new  value; 

X  =  14.  271434  . 
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